Hazy product update - Q323
In the latest release, we've improved model selection, compliance tool and native integrations.
Faster model and metric comparison
Different use cases require different models and users often need to repeat training with different model parameters.
Researchers, data scientists, and developers for example, need to experiment, benchmark and compare data for machine learning models with ease.
To speed up this process, we’ve added a feature enabling users to quickly sort and compare model performance metrics in a simple view, and select the best model for their use case.
The overview of the performance metrics also serves as a tool for users to iterate and tune the models with utility, similarity or privacy to fit their desired use case.
Role-based permissions + active directory integration
Many of our customers have complex systems in place to manage user accounts and access to ensure security at all times. As synthetic data scales through an enterprise, it becomes increasingly important to manage control and permissions at each stage of generating. That’s why the Hazy platform offers granular role and access controls.
Admins can now integrate the Hazy platform with their existing Active Directory (AD). Permissions can be allocated to different roles within an organisation, reducing the risk of error or data leakage and enabling a flexible approach to data management.
This streamlines user management processes, saving admins time and resource whilst also ensuring data is only in the hands of those that need it.
Increased functional validation
Organisations spend a lot of time validating the quality and integrity of models for machine learning.
We've introduced a suite of validation checks that ensure the synthetic data is accurate. In turn, this improves the quality of data-driven decision-making and speeds up time to value of synthetic data.
- Characteristics validation (such as unique count, missing value count, avg string length)
- Datatype checks
- Referential integrity checks across tables
- Rows and column comparison
- Regex ID validation
New Snowflake connector
Users can now quickly and easily read and write to their Snowflake database with the new native connector.
The platform also supports Snowflake database subsetting, a powerful tool when working with large amounts of data. The smaller dataset retains referential integrity and can be pulled into Hazy to avoid downloading the entire warehouse.
Users can now pull data into the platform from the following sources:
Database subsetting is a way of sampling a database. You train on a smaller amount of data, which offers a faster feedback loop and saves time overall. It also reduces the cost of hardware or cloud compute needed to carry out training. You can find out more here.
Improved compliance checks
A lot of the data our customers want to create synthetic versions of contains personal identifiable information (PII). Users now have an extra layer of privacy protection in the Hazy platform with our improved compliance features to support data controllers.
In the configuration stage, the platform carries out two steps:
- auto-detects fields that contain private or sensitive data which may have been misconfigured.
- analyses the model parameters to see whether differential privacy has been applied.
Combining this information, the platform provides warnings for any columns that could pose compliance risks, giving users the opportunity to edit the configuration before training on the data.
These were some of the product highlights from the third quarter of 2023 at Hazy. Schedule a demo of the Hazy platform to see more.