Hub user guide

The Hazy hub is a web application that allows you to create and manage your synthetic data project. You can also interact with it via our API.

This user guide assumes that you have already been provided with your account credentials (email and password), either on a bespoke installation or via our demo platform.

Key terms

Project

A Project represents the dataset you want to generate synthetic data for, and organises all the associated configurations, models and generated synthetic data. Permissions in the Hub are project based. Users and groups can be given access to specific projects.

Data Source

A data source can be a disk file path (bespoke installation only), a cloud storage location or a database connection. The Hub can store credentials for these sources with AES-256-GCM server side encryption. This allows analysis and training to read data from the source. Generation uses a source to store synthetic data. Read more

Configuration

A Configuration defines the parameters for the model training process. The Hub bootstraps the process of creating a configuration by first running an analysis step. The Configuration is broken down into steps and covers both the data schema configuration where the columns in the data are mapped to Hazy's defined data types and model parameters which define how the generative model is trained. Read more

Data Type

Hazy defines a set of data types that columns from the source data need to be mapped to in order to allow the correct interpretation of the data during training. Read more

Model

Once a configuration has been created, you can train a model from it. A model contains the serialised set of statistical properties of the source data, sufficient to generate synthetic versions without any further reference or connection to the source. You can use the hub to evaluate the accuracy of the model’s statistical output, as well as asses its privacy and compliance risk. Finally, you can generate synthetic data from the model, which can be stored on one of your connected data sources, or downloaded as a flat file. Read more

Data Library

The Data Library is a centralised repository where all published datasets can be viewed and downloaded, as long as you have the correct user permissions to do so. Read more