Features
Our customer's favourite features of the Hazy platform:
Our customer's favourite features of the Hazy platform:
Hazy has a differentially private multi-table model capable of synthesising sets of related tables. This allows for accurate replicas of the tables to be recreated with strong privacy guarantees, but also to preserve the mutual information across the tables. One-to-one, one-to-many and many-to-many relationships are supported.
Hazy has a number of different approaches to modelling and generating sequential data. These include, bootstrapping, windowing, temporal Generative Adversarial Networks (GANs), Auto-Encoders and sequential Synthpop.
Handlers are Hazy's algorithms built to generate specific types of data (e.g., first name, emails, ID etc.) and deal with business logic (e.g., formulas, conditions, etc.). We currently offer over 50 handlers and we're always building more. Check out our latest split ID feature here.
Users can quickly sort and compare model performance metrics in a simple view, and select the best model for their use case. The overview of the performance metrics also serves as a tool for users to iterate and tune the models with utility, similarity or privacy to fit their desired use case.
Native connectors enable read and write capabilities numerous file formats ie .csv, .csv.gz, .avro, .parquet which can be stored locally - AWS S3 storage, Google Cloud Storage, Azure Blob Storage; in databases - DB2, MSSQL; and in data warehouses such as Snowflake.
Need another integration? Let us know.
The platform boasts an extensive suite of metrics covering privacy, similarity and utility. Functional validation reports with reporting on cross-table referential integrity, business rules and QA. Metrics documentation and hands-on sessions with Hazy Data Scientists to support interpretability and explainability.
The platform applies differential privacy by default to ensure synthetic data generated is safe and the risk of re-identification is very low.
Our advanced automated data type detection analyses the underlying source to determine what data type a column is, drastically speeding up the process of configuring large datasets. As well as types, we also detect common date format strings, country-specific ID patterns, underlying business logic relationships, and preserve all key relationships between tables, even from flat file sources.
Database subsetting is a way of sampling a database. Train on a smaller amount of data, which offers a faster feedback loop and saves time overall. It also reduces the cost of hardware or cloud compute needed to carry out training. You can find out more here.
As synthetic data scales through a business, it becomes increasingly important to manage permissions at each stage of generating data. That’s why the Hazy platform offers granular role and access controls so data is only ever in the hands of those that need it.
Integrate the Hazy platform with your existing Active Directory (AD). Permissions can be allocated to different roles within an organisation, reducing the risk of error or data leakage and enabling a flexible approach to data management.
We offer country-specific data type handlers including postal addresses and social security numbers. Hazy currently supports country packs for GB, US, AU, CA, CZ, DE, DK, ES, FI, FR, IE, NO, SE.