Post-generation options (prepend, append and encryption)¶
An optional treatment of selected fields in the synthetic data post-generation. There are three options that can currently be applied post-generation:
- Prepend: for example, adding “Synth:” to the data, e.g. Synth:John Doe
- Append: for example, adding ":Synth" to the data, e.g. John Doe:Synth
- Encrypt: encrypting a field (e.g. email), with the option to decrypt it using a customer-provided key
Decryption is carried outside of the Hazy product and requires the customer to perform encryption key management. The platform used to perform the decryption may need significant resources in order to process production workloads.
Metrics to measure the similarity between source sequential (time series) data and synthetic sequential data.
- This feature only works if a sampling frequency is specified or it's detected (detection won't work if there are unknown/missing datetime values), since the metric results won't be meaningful for irregularly sampled time series.
- This feature won't work for spectral features in the time series data (the metric is not sensitive to features in the frequency domain) source.
SQL Server integration using SQL Authentication (as a training data source)¶
Include SQL Server as a data source. This will allow selection of tables as inputs to the training process.
Data source/sink: S3 integration¶
Clients who host data in the cloud via AWS are able to read the source data directly from an S3 bucket for the training of, and then generation of, synthetic data.
Data source/sink: Parquet and Avro files¶
Parquet is a free and open-source storage format for fast analytical querying, developed by Apache. Avro, also developed by Apache, is also a data format that stores the schema in JSON format, making it easy to read and interpret by any program. Hazy is now able to consume these two file formats for training and output in this format for synthetic data generation.
It is possible for Parquet data source to be split across multiple files - Hazy only supports a single Parquet file for ingest.
Avro supports customised "structured types" - Hazy only supports standard data types that match python dataframes.
Including version-specific documentation in Hazy Hub¶
Clients will be able to see version-specific help files in their on-premises version of the Hazy Hub.
Training Performance Visualisation in Hazy Hub¶
Visual graph now appears in the Hazy Hub showing the discrete steps, with elapsed time, of the training process.
Performance Improvement to training¶
Clients can train much faster in a specific use case: when data has a complex set of inferred rules, the performance improvement increases training speed by up to 30X.
Docker Registry release process¶
Clients can download new releases via the Hazy docker registry.
- Configuration model performance & quality improvement
- Time data preprocessing separate from reading
- Improvement to the Presence Disclosure Privacy Metric
Hazy Client Library¶
- Allow setting custom environment variables for docker execution
- Fix issues related to pandas upgrade
- Non lossy single degree distributions
- Improve composite key generation
- Validate predictor choice
- Add no-repeating edges to configuration model
- Fix entropy estimator
- Fix date format handler error handling
- Fix undefined logger
- Remove unused date_cols assignment
- Remove unused output_path assignment
- Reduce number of python dependencies
- Redirect to previous location when editing user
- Give team members access to metrics page for model
- Change org member display meaning
- Better messaging for org blankslates
- Allow org admins to promote users to admin
- Org admin can now disable users
- Sessions now include the IP address
Hazy Client Library¶
- Ensure table_paths is set
- Update logic to determine paths in multi-table from client library