Retain formats and maintain privacy with ID functionality
Removing personal identifiable information from datatypes doesn’t mean you have to replace it with random meaningless data.
With Hazy’s Split ID function, you can preserve the real world context of string data types whilst protecting the privacy of your customer’s information.
The split ID feature allows the user to specify a certain modelling behaviour for sections of a string column. This is useful when particular sections of a string field have utility for downstream analytics.
An example would be the first [6] characters in a credit card number that represent both the card issuer and the payment provider.
The new sampler works by splitting a string column into a number of components and then either modelling that component as a categorical value (i.e. its marginal and joint distribution will be preserved as if it were any other categorical field) or replacing that component with a string that is generated from a regex pattern.
The user is able to specify on which characters they would like to split and what the modelling behaviour would be for that component.
For more information about this feature in in a customer use case, get in touch.