Synthesiser deployment

Security is a critical part of our deployment. This is a guide to the security practices Hazy applies to it's development and release process and best practices for deployment of Synthesisers within the enterprise.

Development and Release Security Procedures

All synthesiser code is continuously analysed during the development process and as part of the release procedure in order to detect and prevent security vulnerabilities.

  • Dependencies. Hazy automatically checks for known security vulnerabilities in its dependencies as part of the automated release process. Any dependencies with security issues are fixed.

  • Minimal Images. Hub container images are based off a distroless image. These images contain only the application and the minimal dependencies required to run that application.

    Restricting what's in your runtime container to precisely what's necessary for your app is a best practice employed by Google and other tech giants that have used containers in production for many years. It improves the signal to noise of scanners (e.g. CVE) and reduces the burden of establishing provenance to just what you need.

  • Vulnerability Scans. As an automatic part of the container build and release process, all container images are analysed using the Trivy vulnerability scanner.

    Any issues are checked for applicability and severity before the image is released to customers.

Container Security

The Synthesiser image (or images) supplied to you by Hazy will have been specifically customised for your data and environment, so the exact set of security mitigations available will change from Synthesiser to Synthesiser. However Hazy brings the same security conscious approach to these containers:

  • Requires no external network access

    Hazy applications will never attempt to connect to an external service (unless required by the Customer's installation topology)

The key difference between the Synthesiser containers and the Hub is that the Synthesisers are required to write files to their mounted volumes in such a way that the data is available outside the container runtime (whereas the Hub application writes data with the expectation that it is the only system accessing it).

This brings in some complexity in terms of data readability after the container has exited that don't exist for the Hub.

Docker

When running with a root-full Docker installation, the easiest way to ensure the correct file permissions on the resulting artifacts (trained models in the case of a training run, synthetic data in the case of generation) is to run the container as the current user.

As a baseline, the Synthesiser image requires very few permissions to run.

The /home/user/hazy directory contains a params.json (and in the case of a generation run, the trained model). The params.json file describes the input and output paths in the context of the container, e.g.

{
    "action": "generate",
    "params": {
        "output": "/mount/output/generated.csv",
        "model": "/mount/input/model.hazymodel",
        "num_rows": 100000
    }
}
# create a $USER writable directory for the synthetic data
$ dir=$(mktemp -d)

$ sudo docker run \ 
    --rm \
    -v /home/user/hazy:/mount/input:Z \
    -v $dir:/mount/output:Z \
    --tmpfs /tmp:rw,noexec \
    --cap-drop=ALL \
    --security-opt=no-new-privileges \
    --user $(id -u):$(id -g) \
    hazy-synthesizer/tabular run --parameters /mount/input/params.json

# $dir/generated.csv will be owned by $USER
$ cat $dir/generated.csv

Rootless Docker

The Synthesiser containers will run perfectly well with either a root-less or user-namespaced docker installation.

With a rootless installation, the generated files (models or synthetic data) will be owned by the same user who owns the Docker process. If the system is configured so that the train or generate batch process is run by the same user who owns the Docker daemon, the command above can be used but without the need to specify a --user for the container (since it will automatically run as the right user).

# create a $USER writable directory for the synthetic data
$ dir=$(mktemp -d)

# no need for root as the docker daemon process is owned by $USER
$ docker run \ 
    --rm \
    -v /home/user/hazy:/mount/input:Z \
    -v $dir:/mount/output:Z \
    --tmpfs /tmp:rw,noexec \
    --cap-drop=ALL \
    --security-opt=no-new-privileges \
    synthesizer/image run --parameters /mount/input/params.json

# $dir/generated.csv will be owned by $USER
$ cat $dir/generated.csv

The situation with a user-namespaced installation is more complex however. In this configuration, the output files will always be owned by the unprivileged user Docker runs the containers as.

As stated in the Docker documentation:

This re-mapping is transparent to the container, but introduces some configuration complexity in situations where the container needs access to resources on the Docker host, such as bind mounts into areas of the filesystem that the system user cannot write to. From a security standpoint, it is best to avoid these situations.

If you have configured Docker with user namespace remapping and the file ownership issue is a problem then one solution is to disable the user-namespace remapping and run the container as the logged in user:

docker run ... --userns=host --user $(id -u):$(id -g) synthesizer/image ...

This will work exactly as described above for the root-full Docker configuration.

Podman

Podman, when run as the current user, will work exactly as the root-full or root-less Docker examples above:

# create a $USER writable directory for the synthetic data
$ dir=$(mktemp -d)

$ podman run \ 
    --rm \
    -v /home/user/hazy:/mount/input:Z \
    -v $dir:/mount/output:Z \
    --tmpfs /tmp:rw,noexec \
    --cap-drop=ALL \
    --security-opt=no-new-privileges \
    synthesizer/image run --parameters /mount/input/params.json

# $dir/generated.csv will be owned by $USER
$ cat $dir/generated.csv