How Hazy works

Hazy generates smart synthetic data that can be used as a drop-in replacement for real data for data science and analytics workloads.

Smart synthetic data¶

Synthetic data means artificial data — data made up of artificial records and data points. Hazy’s synthetic data is sample based, which is to say that it is based on and derived from source data that you provide.

Specifically, Hazy data is designed to preserve all the patterns, statistical properties and correlations in the source data, so that it can be used as a drop-in replacement for it.

Safe to work with¶

Hazy data is safer to use than real data because it is made from artificial data points. This means that it does not contain any real data or customer information.

Hazy data is also differentially private and allows you to control differential privacy level, disclosure risk thresholds and optimise trade-offs between privacy and utility. For more information, see privacy reference docs.

For data science¶

Unlike most artificial data, Hazy data is smart enough to be used for real world data science and analytics workloads. This is because Hazy data is:

based on and preserves the patterns and correlations from the real data.
optimised for utility — so that models trained on Hazy data maintain their performance.

For more information, see use cases and data quality.

What is synthetic data?