Anonymisation is a complex area and has been the focus of a number of recent research papers, many of which have helped inform our developers and data scientists while building our automated anonymisation tool. In order to help navigate this exciting new area of research, we have conducted a review of the available literature, which you can download here.
The first place people tend to start when thinking about avoiding leaking personal data is masking or redacting data: splatting the values, so “James” becomes “Xxxxx”. This can be useful for removing direct identifiers but you need to get much more sophisticated if you're actually going to be able to make any guarantees about privacy or re-identifiability. For that, you need to work across individual values and understand the records and linkages in the data.
This is where anonymisation comes in. Anonymisation allows you not only to analyse privacy and re-identifiability but also optimise trade-offs between privacy and usability. So rather than splatting all the values, you can make fine judgements about how to transform or generalise values (how to blur the data or add noise to it) so that the data remains usable whilst also maintaining privacy.
Now, our whole product is designed so that you don’t have to be a data scientist to make those trade-offs and optimise anonymisation for your use case. However, to make that true, we’ve had to dig into and then abstract away the underlying technical details. Rather than just hide these away, we thought it would be useful to publish our review of the anonymisation literature so that other technologists can see what’s really going on under the hood.
The link again to download as a PDF is here
If you want to talk more about the potential applications of data anonymisation, or if you want to learn about the underlying technology, please do get in touch.