How Hazy works

Hazy generates smart synthetic data that can be used as a drop in replacement for real data for data science and analytics workloads.

Smart synthetic data

Synthetic data means artificial data -- data made up of artificial records and data points. Hazy's synthetic data is sample based, which is to say that it is based on and derived from source data that you provide.

Specifically, Hazy data is designed to preserve all the patterns, statistical properties and correlations in the source data, so that it can be used as a drop in replacement for it.

Safe to work with

Hazy data is safer to use than real data because it's made out of artificial data points. This means that it doesn't contain any real data or customer information.

Hazy data is also differentially private and allows you to control differential privacy level, disclosure risk thresholds and optimise trade-offs between privacy and utility. For more information, see our privacy reference docs.

For data science

Unlike most artificial data, Hazy data is smart enough to be used for real world data science and analytics workloads. This is because Hazy data is:

  • based on and preserves the patterns and correlations from the real data; and
  • optimised for utility — so that models trained on Hazy data maintain their performance

For more information, see use cases and our reference docs on data quality.