Synthesisers

Synthesisers form the core of the Hazy synthetic data platform. They are responsible for both training Generator Models from the source data and then using those Generator Models to produce Synthetic data. Hazy has a suite of Synthesisers in the form of Docker images, each of which is designed to work with particular types of source data.

Synthesisers can be used through the:

  • Command line
  • Hazy client library

This document will demonstrate how to use a given synthesiser with both of these methods.

Command line

The command for running a synthesiser from the command line is as follows:

docker run -v <path-to-working-directory>:/mount/ -it <docker-image> run --parameters /mount/<params_file.json>

Using a synthesiser from the command line requires two JSON files saved in the working directory, one containing the parameters for training, and one for generating. The Docker path to the JSON file must be provided after the --parameter argument.

Volume mapping is used to persist data generated from and used by the docker containers so the path to the local working directory must be bound to the mount directory of the docker container.

To find the parameters that are required for the training or generation step we can use the describe command given below. This will output a JSON file with each parameter and its description for both the training and generation stages.

docker run -v <path-to-working-directory>:/mount/ -it <docker-image> describe --output /mount/help.json

The general schema of the parameters JSON file is presented below. First the action is given, this can be either "train" or "generate". The parameters for the process are then given within another dictionary after the "parameters" key.

{
	"action": "train" OR "generate",
	"parameters": {parameters-go-here}
}

For more detailed instructions about securing your Synthesiser installation see Security.

Client library

To use a synthesiser with the client library we must first have a synthesiser object. Synthesisers are stored on the hub, so we can use a hub object and the synthesiser method to download our synthesiser. To do this we pass it the name of the synthesiser as it appears on the hub (i.e. "project/tabular:TAG") along with the working directory. If Docker does not have the image this will load the synthesiser image into our Docker and return a synthesiser object. Alternatively, if Docker already has the image, the download will be skipped and a synthesiser object will be returned.

hub = HazyHub(
    host="https://hub.hazy.com",
    api_key="9LdqSnXYI6HDS/jGaI1JGmRLlUb5/Qq7UW/fn/BPhK0=")

synthesiser = hub.synthesiser(
    image_name="project/tabular:20210322T152945",
    work_dir=".")

Training and generation stages for each synthesiser will be demonstrated in the following sections.

Synthesiser types

Hazy has a suite of Synthesisers that it uses to support differing types of data.