Five compelling use cases for synthetic data
Blog

Five compelling use cases for synthetic data

By on 01 Jun 2020.

Hazy is a synthetic data generation company. We equip and enable businesses to get the most out of their data but in a safe and ethical way. How? Synthetic data is completely artificial data that is statistically equivalent to your raw data.

Machine learning and AI algorithms identify statistical patterns and properties of your real sensitive datasets, and we use those to generate completely artificial synthetic data that is statistically equivalent to your original data.

This means synthetic data is useful to many stakeholders who want to build, test or develop with your sensitive data, but are unable to access it due to common governance concerns such as exposing personally identifiable information. Furthermore, unlike anonymised data, there is no risk of re-identification or customer information leaks.

Today I’m going to try to explain some of the most common use cases for synthetic data that I’ve uncovered talking to customers over the last two years.

Hazy Synthetic Data for Data Portability

Who uses it? Chief data officers, chief risk officers, heads of data science leads, analytics leads, R&D heads, privacy and security, directors of IT, and anyone orchestrating change management and mergers and acquisitions. It’s usually the teammates most eager to break down silos and collaborate and innovate with cross-enterprise data.

How does synthetic data help with data portability? Heavily regulated multinational institutions like banks are struggling not only to compete with up and coming services, but are dealing with cross-border and cross-organisational laws and privacy regulations. This struggle is enhanced when you are combining two regulated entities in M&A.

Hazy’s patent-pending data portability allows you to train a synthetic data generator on-site at each location or within each siloed division. Then a centralised generator can combine multi-table datasets — with thousands of rows and columns — can combine the synthetic data coming from different environments to gain a fully cross-organisational overview.

IT designers are increasingly being called upon to engage with regulatory compliance through Article 25 of the European General Data Protection Regulation (GDPR). This provision establishes the legal obligation to do information privacy by design and requires IT designers to build appropriate technical or organisational safeguards into their systems.

With the Internet of Things, personal information is collected by physical sensors in socially complex, traditionally private settings. Many of these IoT services maintain an ongoing relationship with users where their personal data is mined and analysed with the goal of providing value – like automating routine tasks like room heating management. Readings from motion, temperature or C02 sensors can be combined to make inferences, develop behavioural profiles, and make predictions about users. There are privacy implications around how this personal data is pieced together to create models of room and building occupancy.

Synthetic data alleviates the infrastructure requirements, especially in dealing with data portability, since, by exporting just synthetic versions of sensitive data, it can automatically satisfy all sides of the triangle:

Hazy Synthetic Data for Open Innovation

Who uses it? Anyone who works with or evaluates third-party partners like apps that want to build value on top of your data.

How does synthetic data help open innovation? More and more of our work relies on partnering with external innovators.

While open banking APIs have enabled third-party developers to build apps and services around financial institutions for a couple years now, those partnerships are often not reaching their full potential.

It’s the job of innovation departments within enterprises to seek out cutting-edge tech startups and scaleups that are on the verge of disrupting the status quo. They need to quickly evaluate these new tech companies. But it’s difficult to innovate or to test these innovation partners without realistic datasets.

And it can take six months months or more to jump through legal and procurement hurdles to then give the startup access to the raw data, which still doesn’t eliminate risk. And this is all just to determine whether or not you want to partner with them.

Hazy synthetic data is leveraged by innovation teams at Nationwide and Accenture to allow these heavily regulated multinationals to quickly, securely share the value of the data, without any privacy risks. Once you onboard us, you can then spin up as many synthetic data sets as you want which you can then release to your prospects. Allow them to fail fast and get your rapid partner validation.

Hazy for Open Innovation flow chart: Your live data → Hazy for Innovation → Smart synthetic data → Test ideas and vendors

Hazy Synthetic Data for Testing

Who uses it? Often product quality assurance analysts, testers, user testing, and development. Synthetic data is an easy way to thoroughly test before you go live.

How do testers use synthetic data? Synthetic data is a perfect alternative especially in our remote-first world. Maybe you can’t share sensitive data or you don’t want to because creating any unnecessary copies of data increases risk for leaks.

Also in the world of GDPR and the California Privacy Rights Act (CPRA), your commitment to privacy is intrinsically linked to the trust in your brand. Users have a right to request to be forgotten. But, frankly, how often do we just click close on our mobiles to get to where we’re trying to go? While GDPR is proven to enhance human behaviour around personal data, it’s up to organisations to hold up the intent of the law.

Synthetic data allows you to create as many artificial copies of data patterns as needed, without holding onto any of the real data. You can also generate synthetic data based on business rules.

Smart synthetic data generation allows for the creation of a rare combination of events which allows you to better test the resiliency of the IT infrastructure. This also enables test driven development where you maybe don’t even have the accurate customer data yet, but you want to test a proof of concept.

Hazy for Testing flow chart: Your live data → Hazy for Testing → Safe, realistic test data → High quality testing

Hazy Synthetic Data for Data Analytics

Who uses it? Since much of the Hazy team has an academic and financial services background in data science, this is a favourite to not only offer to customers, but to use ourselves to check the quality of our machine learning models and our synthetic data generators. Data scientists, machine learning engineers, and anyone in a research role can take advantage of synthetic data for analytics. It’s particularly useful in analytics departments within banks, in risk management, lending, and financial crime units.

How do data scientists use synthetic data? Data scientists in highly regulated industries need high quality, highly representative data in order for them to test the algorithms they are creating. If they’ve got access to safe synthetic versions of their raw data that’s going to massively speed up the time to test their algorithms.

Hazy for Data Science and Analytics flow chart: Your live data → Hazy for Data Science → Smart synthetic data → Exploratory data science

Hazy for Cloud Synthetic Data Migration

Who uses it? Any organisation looking to be more competitive in the flexible cloud, but are afraid of putting any sensitive data in the less trusted cloud environment.

How does synthetic data help with cloud migration? A lot of enterprises backed by legacy architecture are struggling to compete, but are wary of the cloud. Creating synthetic versions of the data to move up to the cloud.

Hazy for Cloud Migration flow chart: On-premise data → Hazy for Cloud Migration → Only the generators move → Hazy Hub and Client → Cloud resources

How Hazy data works

Hazy is the most advanced smart synthetic data generator on the market. We’ve attracted a world-class team of data scientists and engineers to build a product with the financial industry in mind. It’s not just because we have an exciting product — and we do — but we all share in a singular ethical focus — Privacy by design.

Hazy is unique in its use of the most advanced machine learning algorithms that are differentially private by default. Our synthetic data retains the useful patterns within a group, while withholding any identifying details within that group. Only trust synthetic data generators that can provide you with the gold standard guarantee of differential privacy.

Hazy specialises in financial services, already helping some of the world’s top banks and insurance companies reduce compliance risk and speed up data innovation by allowing them to work freely on safe, smart synthetic data.

Subscribe

Synthetic data newsletter

Sign up for our sporadic newsletter to keep up to date on synthetic data, privacy matters and machine learning.