4.2.0

Features

Upload and download models from the Hub

Previously this functionality was only exposed in API form. When downloading a model it is recommended the user strips the metrics as these should be private to the data owner.

This capability is one way to utilise multiple Hub zones. The user can download models from the source data training zone and upload them to the synthetic data generation zone. It can also be used to share models across regional boundaries keeping the source data in the original region.

PostgreSQL database connection

This is a full PostgreSQL database integration. It can now be used in both the Python SDK and the Hub. It works with analysis/training/generation and database subsetting.

Oracle database connection

This is a full Oracle database integration. It can now be used in both the Python SDK and the Hub. It works with analysis/training/generation and database subsetting.

Databricks connection

This can now be used in both the Python SDK and the Hub. It works with analysis,training and database subsetting.

Known limitations

Currently, hazy only supports INPUT I/O type for Databricks connection. If you would like to write out to Databricks, you can use an intermediate object storage (e.g. S3, GCS) and use Databrick's COPY FROM provided you have configured object store access from within Databricks SQL Warehouse appropriately. See Tutorial: Configure S3 access with instance profile.

Google BigQuery connection

This can now be used in the Python SDK and the Hub. It works for analysis, training and generation.

Known limitations

Google BigQuery does not currently have database subsetting support - therefore its best used for single table training or pre-subset data. This limitation will be removed in a future release.

Improvements

AIM and MST can now be used with multi-table

Memory usage has been optimised to remove this limitation. The option to pick these models is now available in the Hub for both single and multi-table training settings.