Standalone Synth deployment
When running with a rootfull Docker installation, the easiest way to ensure the correct file permissions on the resulting artifacts (trained models in the case of a training run, synthetic data in the case of generation) is to run the container as the current user.
As a baseline, the standalone Synthesiser image requires very few permissions to run.
The /home/user/hazy
directory contains a params.json
(and in the
case of a generation run, the trained model). The params.json
file
describes the input and output paths in the context of the container,
for example:
{
"action": "generate",
"params": {
"output": "/mount/output/generated.csv",
"model": "/mount/input/model.hazymodel",
"num_rows": 100000
}
}
# create a $USER writable directory for the synthetic data
$ dir=$(mktemp -d)
$ sudo docker run \
--rm \
-v /home/user/hazy:/mount/input:Z \
-v $dir:/mount/output:Z \
--tmpfs /tmp:rw,noexec \
--cap-drop=ALL \
--security-opt=no-new-privileges \
--user $(id -u):$(id -g) \
hazy-synthesizer/tabular run --parameters /mount/input/params.json
# $dir/generated.csv will be owned by $USER
$ cat $dir/generated.csv
Rootless Docker¶
The isolated Synthesiser containers runs perfectly well with either a rootless or user-namespaced docker installation.
With a rootless installation, the generated files (models or
synthetic data) are owned by the same user who owns the Docker
process. If the system is configured so that the train or generate
batch process is run by the same user who owns the Docker daemon, the
command above can be used but without the need to specify a --user
for the container (since it automatically runs as the right user).
# create a $USER writable directory for the synthetic data
$ dir=$(mktemp -d)
# no need for root as the docker daemon process is owned by $USER
$ docker run \
--rm \
-v /home/user/hazy:/mount/input:Z \
-v $dir:/mount/output:Z \
--tmpfs /tmp:rw,noexec \
--cap-drop=ALL \
--security-opt=no-new-privileges \
synthesizer/image run --parameters /mount/input/params.json
# $dir/generated.csv will be owned by $USER
$ cat $dir/generated.csv
User namespaced Docker¶
However, the situation with a user-namespaced installation is more complex. In this configuration, the output files are always owned by the unprivileged user Docker runs the containers as.
As stated in the Docker documentation:
This re-mapping is transparent to the container, but introduces some configuration complexity in situations where the container needs access to resources on the Docker host, such as bind mounts into areas of the filesystem that the system user cannot write to. From a security standpoint, it is best to avoid these situations.
If you have configured Docker with user namespace remapping and the file ownership issue is a problem then one solution is to disable the user-namespace remapping and run the container as the logged in user:
docker run ... --userns=host --user $(id -u):$(id -g) synthesizer/image ...
This works exactly as described above for the rootfull Docker configuration.
Hazy Hub¶
The hub requires more privileges in order to be able to spawn training and generation tasks from the UI. Consult the main installation documentation for details.