Audit log

All actions taken by all users in the Hub are now recorded to an audit log. This includes actions such as creating a project, training a model, modifying a configuration, and so on. The audit log is available to users with the hazy:AuditViewer role.

The dashboard now also shows ‘recent changes’, so users can easily jump back to where they left off editing.

Custom (Javascript) handlers

Users can now write a custom single column handler in Javascript to perform arbitrary pre-training and post-generation mapping steps in their synthetic data pipelines. This handler can be applied to Raw type columns, and provides a sandbox V8 JS runtime that runs as part of the Hazy worker.

Known limitations

  • Single column to single output
  • Limitations on sanboxed JS (e.g. no/limited networking)

Differentially private processing

Up until this point Hazy's core generative models such as PrivBayes have been differentially private. However the data is preprocessed before it enters the model to begin training. This processing also "sees" the data so if we want to be strict about applying differential privacy we must inject noise during the processing also. Hazy now allows the user to choose a separate privacy budget (epsilon) for the processing. In a single table scenario, if the user picks ε1 for the generative model and ε2 for the processing the maximum budget using the composition theorem of differential privacy will be ε1 + ε2. This is done using the approx_bounds function as described in Damien Desfontaines' thesis.

Known limitations

  • We're not performing sensitivity analysis based on the different entities within the tables and all table sensitivities are assumed to be equal.


Database subsetting for Google BiqQuery

The Google BigQuery connector now supports database subsetting which means it can be used to pull in a sensible sample into the Hazy platform for training. This better supports workflows with multiple tables of data. For more information on database subsetting see here.