Models

The models tab of the project page lists all the models that have been trained from the project's configurations. If you have chosen to run our evaluations, the summary results will also be shown here, along with model parameters that are likely to have affected the quality scores. All columns can be sorted by clicking on the column heading, so you can easily sort all models by their privacy score, for instance. The overview of the performance metrics also serves as a tool for users to iterate and tune the models with utility, similarity or privacy to fit their desired use case.

Model overview

The model overview screen shows

  • Model name: Each model is assigned a unique nmae on creation, but you can change it here.
  • Model description: You can add a description to note any details about source version etc.
  • Training configuration: The configuration used to train the model. Note this configuration is still editable, so the configuration may have changed since the model was trained.
  • A summary of our Compliance checks.
  • Schema: By default, models include a small sample of generated synthetic data, which can be used as a quick verification that the output matches the columns and datatypes.

Metrics

If your configuration included any evaluation metrics, you will also be able to explore these here. By default, similarity metrics are enabled: Marginal distribution and mutual information can be viewed for each table, as well as a multi-table metrics section to view cross-table mutual information and degree distribution similarity.

Training

The Training performance analysis provides information about the time elapsed and memory used for the model training phase. This can be useful to determine whether the training is running on adequate hardware, and as a comparison for seeing the effect of different configurations on training. The evaluation metrics in particular can have a significant impact on training time, for instance.

The Training parameters section shows the full set of parameters used to train the model. This can be useful to see the effect of different configurations on training, and to use as a starting point for training models via our API.

Generate

Once we are happy with our model, go to the "Generate" tab to generate synthetic data. Here we can specify the destination for the synthetic data generated.

The “Magnitude” parameter controls how much synthetic data is generated compared to the original (a magnitude of 1 generates the same amount of synthetic data rows as source data rows, magnitude of 2 generates double the amount, etc.). You can preview the effect magnitude will have on each output table.

Once you have seleted “Generate”, the generation job will start. You can select the generation job from the sidebar, where all jobs for your model are listed. Here you can follow the task progress, and view the validation report for your data once the job is complete.