Data anonymisation - what, why and how?

Data anonymisation - what, why and how?

By on 20 Aug 2018.

Data anonymisation is a set of processes and methods applied to a dataset in order to ensure that data cannot be attributed to an individual or entity from which the data comes.

In order to efficiently and appropriately anonymise data, detailed knowledge of the data is critical. For example, anonymising addresses presents a different challenge than when dealing with names. This is where automation can make a tangible difference. By automatically identifying the nature of the data, it becomes possible to anonymise large datasets without any manual input and human supervision.

Good for business

In today's digital world, both individuals and companies rely heavily on the ability to access and share data. Take bank accounts, to open a new account an individual has to share sensitive personal data with a bank, the bank will then in turn share this data with external developers, credit scoring companies, etc.

In these multiple exchanges, data is at risk of being leaked, putting the individual at personal risk and presenting a serious liability for the company. Anonymisation offers a solution for both of these concerns as the anonymised data is subsequently much safer to share whilst at the same time retaining it's usability.

Anonymisation or pseudonymisation?

Pseudonymisation is a technique that can improve user privacy by replacing or removing the most identifying fields in a data set. This may involve replacing names or other direct identifiers which can be easily attributed to individuals with, for example, a reference number. Pseudonymised data can reduce the risks of identification of data subjects and help companies meet some data protection obligations.

In essence, this is just a security measure and does not change the status of the data as personal data. Sharing any data that has undergone pseudonymisation still has to be done strictly in accordance with GDPR standards. For instance, if the data subject requests access and asks to be removed from the data, the appropriate procedures have to be followed and response should be fulfilled without undue delays, regardless of any attempt to pseudonymise any personally identifiable information

However, GDPR does not apply to personal data that has been anonymised if performed in a correct and robust way. Anonymising personal data can therefore provide a more effective method for businesses seeking to limit the risks when sharing data, whilst protecting identifiable data too.

Putting anonymisation to use

There are various anonymisation techniques, but unfortunately, there's no magic bullet. To achieve better anonymisation you need to spend time finding an effective combination of the following techniques for each individual data set and use case.

What else can I do to enhance data privacy?

There are a number of privacy enhancing technologies on the market, some of them are expensive and require enterprise grade technical capacity, but others are fairly simple and affordable for SMEs and even young startups.

Selecting the right combination may be a challenge, despite the fact that the knowledge of different methods is quite evolved and there's a lot of available information. So, we've included a selection to help discerning businesses.

Photo credits Martin Adams from Unsplash


Synthetic data newsletter

Sign up for our sporadic newsletter to keep up to date on synthetic data, privacy matters and machine learning.