Synthesisers

Synthesisers form the core of the Hazy synthetic data platform. They are responsible for both training Generator Models from the source data and then using those Generator Models to produce Synthetic data. Hazy has a suite of Synthesisers in the form of Docker images, each of which is designed to work with particular types of source data.

Synthesisers can be used through the:

  • Command line
  • Hazy client library

This section will demonstrate how to use a given synthesiser with both of these methods.

Command line

The command for running a synthesiser from the command line is as follows:

docker run -v <path-to-working-directory>:/mount/ -it <docker-image> run --parameters /mount/<params_file.json>

Using a synthesiser from the command line requires two JSON files saved in the working directory; one containing the parameters for training, and one for generating. The Docker path to the JSON file must be provided after the --parameter argument.

Volume mapping is used to persist data generated from, and used by, the docker containers so the path to the local working directory must be bound to the mount directory of the docker container.

To find the parameters that are required for the training or generation step we can use the describe command given below. This outputs a JSON file with each parameter and its description for both the training and generation stages.

docker run -v <path-to-working-directory>:/mount/ -it <docker-image> describe --output /mount/help.json

The general schema for the parameters JSON file is presented below. First the action is given, this can be either "train" or "generate". The parameters for the process are then given within another dictionary after the "parameters" key.

{
	"action": "train" OR "generate",
	"parameters": {parameters-go-here}
}

For more detailed instructions about securing your Synthesiser installation, see Security.

Client library

To use a synthesiser with the client library we must first create a synthesiser object. Synthesisers are stored on the hub, so we can use a hub object and the synthesiser method to ensure the appropriate Synthesiser image is available. In order to do this we pass it the name of the synthesiser as it appears on the hub (for example, "project/tabular:TAG"). If the image does not exist in the current docker context it will be pulled from the remote registry. A synthesiser object is returned that can subsequently be used to invoke training and generation.

hub = Hub(
    host="https://hub.hazy.com",
    api_key="9LdqSnXYI6HDS/jGaI1JGmRLlUb5/Qq7UW/fn/BPhK0=")

synth = hub.synthesiser(
    image_name="registry.northwindtraders.com/hazy/multi-table:4.0.0")

Training and generation stages for each synthesiser are demonstrated in the following sections.

Synthesiser types

Hazy has a suite of Synthesisers that it uses to support differing types of data.