Standalone Synth deployment

When running with a rootfull Docker installation, the easiest way to ensure the correct file permissions on the resulting artifacts (trained models in the case of a training run, synthetic data in the case of generation) is to run the container as the current user.

As a baseline, the standalone Synthesiser image requires very few permissions to run.

The /home/user/hazy directory contains a params.json (and in the case of a generation run, the trained model). The params.json file describes the input and output paths in the context of the container, for example:

{
    "action": "generate",
    "params": {
        "output": "/mount/output/generated.csv",
        "model": "/mount/input/model.hazymodel",
        "num_rows": 100000
    }
}
# create a $USER writable directory for the synthetic data
$ dir=$(mktemp -d)

$ sudo docker run \ 
    --rm \
    -v /home/user/hazy:/mount/input:Z \
    -v $dir:/mount/output:Z \
    --tmpfs /tmp:rw,noexec \
    --cap-drop=ALL \
    --security-opt=no-new-privileges \
    --user $(id -u):$(id -g) \
    hazy-synthesizer/tabular run --parameters /mount/input/params.json

# $dir/generated.csv will be owned by $USER
$ cat $dir/generated.csv

Rootless Docker

The isolated Synthesiser containers runs perfectly well with either a rootless or user-namespaced docker installation.

With a rootless installation, the generated files (models or synthetic data) are owned by the same user who owns the Docker process. If the system is configured so that the train or generate batch process is run by the same user who owns the Docker daemon, the command above can be used but without the need to specify a --user for the container (since it automatically runs as the right user).

# create a $USER writable directory for the synthetic data
$ dir=$(mktemp -d)

# no need for root as the docker daemon process is owned by $USER
$ docker run \ 
    --rm \
    -v /home/user/hazy:/mount/input:Z \
    -v $dir:/mount/output:Z \
    --tmpfs /tmp:rw,noexec \
    --cap-drop=ALL \
    --security-opt=no-new-privileges \
    synthesizer/image run --parameters /mount/input/params.json

# $dir/generated.csv will be owned by $USER
$ cat $dir/generated.csv

User namespaced Docker

However, the situation with a user-namespaced installation is more complex. In this configuration, the output files are always owned by the unprivileged user Docker runs the containers as.

As stated in the Docker documentation:

This re-mapping is transparent to the container, but introduces some configuration complexity in situations where the container needs access to resources on the Docker host, such as bind mounts into areas of the filesystem that the system user cannot write to. From a security standpoint, it is best to avoid these situations.

If you have configured Docker with user namespace remapping and the file ownership issue is a problem then one solution is to disable the user-namespace remapping and run the container as the logged in user:

docker run ... --userns=host --user $(id -u):$(id -g) synthesizer/image ...

This works exactly as described above for the rootfull Docker configuration.

Hazy Hub

The hub requires more privileges in order to be able to spawn training and generation tasks from the UI. Consult the main installation documentation for details.