Sequential and Reference tables configuration via the Hub¶
Hazy already supports sequential data in the product. We've now made it easy to configure tables containing
this type of data via the Hub application. On the table configuration page, you have the option to select the table type.
By default the table type is
- Sequential metric configuration not yet available.
Hazy can now train on data held in Snowflake data warehouses. Hazy supports in-database subsetting inside Snowflake before training. Hazy creates a set of temporary tables inside Snowflake by sampling tables and following foreign key links to obtain a smaller subset of the database while maintaining referential integrity. This smaller dataset can then be pulled into Hazy to avoid downloading the entire warehouse.
Data can also be output to Snowflake.
Configurable Data Sources¶
Hazy now provides roles based access to data source configuration through the Hub application.
Only users with the role
hazy/DataSourceAuthor are able to configure/edit/delete/view data sources. Data sources can also be attached to
projects by this role. Permissions in Hazy are project based. Since users/groups can be assigned to projects this allows none
hazy/DataSourceAuthor roles to analyse/train/generate with only the data sources they've been assigned.
All credentials are passed to the Hub application over TLS where they are immediately encrypted with AES-256-GCM before storage. Once a data source has been added - we never return plaintext credentials to the frontend client. Credentials are only ever decrypted just before accessing data to minimize amount of time plaintext credentials are held in memory.
Supported data sources are:
- Disk for on-prem solutions
- AWS S3 storage
- Google cloud storage
- Microsoft SQL Server
Sample Data RBAC¶
During analysis we now take a small sample of the data (around 5 rows worth) which can be viewed in the UI to aid configuration.
This data is immediately encrypted during the analysis task, before its written to the DB. Only users with the role
view this data where it is decrypted before sending to the client over TLS. Deleting a data source will immediately delete all sample data
relating to that source. Detaching the data source from the project will also immediately delete all sample data relating to that source. Used
must also be assigned to the project to view the samples.
Collection of sample data can be turned off completely also on startup if desired - although we believe a small amount of samples can greatly speed up configuration time. The samples are taken randomly within columns to avoid sampling full records.
Model Metrics Comparison¶
The models page now contains a set of summary statistics for each model. This allows users to quickly compare top-level model performance metrics before stepping into a model for more detailed information.
The privacy/similarity trade off can be viewed for users training models on the same schema with different hyper-parameters.
GDPR Compliance Management¶
Hazy has taken the next step in helping users manage their compliance with General Data Protection Regulation.
During configuration users are asked to identify any columns that contain PII. Only certain data types should be used when columns contain PII. Hazy provides a set of warnings pre-training for any types which clash with the sensitive data held in the column which would represent non-compliance.
Hazy analyses the model parameters to provide warnings in the case the model is not differentially private.
Any warnings surfaced are stored in the model file and resurfaced on the model page to highlight any issues before data generation takes place.
AWS Secrets Manager Integration¶
A set of secret environment variables can now be configured to use AWS secrets manager. This provides an extra layer security such that secrets are never stored long term on the host machine.
The following environment variables are permitted:
Hazy Hub can now be installed using our AWS Marketplace install.