Compliance

Personal Data

Personal data, as defined by GDPR, refers to any information that relates to an identified or identifiable natural person. This can include a wide range of data, also called PIIs (Personally Identifiable Information), such as names, identification numbers, location data, online identifiers, or factors specific to an individual's physical, physiological, genetic, mental, economic, cultural, social identity, etc. In essence, personal data encompasses details that can directly or indirectly lead to the identification of an individual. The GDPR imposes stringent regulations and safeguards on the processing of personal data to ensure the protection of individuals' privacy and fundamental rights.

Personal Data Roles

Data Controller, as defined by GDPR, holds the pivotal role of determining the purposes and means of processing personal data. This encompasses establishing the legal basis for data processing, defining the specific data processing activities, and assuming overall responsibility for compliance with GDPR regulations. The Data Controller is entrusted with the duty to uphold the rights of data subjects, ensure transparency in data processing, and maintain a robust data governance framework.

All Hazy users are currently assumed to be Data Controllers.

PIIs and Hazy Configuration

Assessing what information carries PIIs and their significance in relation to re-identifiability risk is challenging, primarily due to their nuanced and contextual nature. In any case, it is vital not to misconfigure any columns containing PIIs.

Users are advised to configure all PIIs, irrespective of their potential re-identifiability risk. A few examples include columns carrying highly sensitive information such as first and last names, passport number, social security number, email address, etc. For them, we support an additional privacy layer through our custom PersonType Handler to make sure they are protected. Other PIIs include gender, race, age/date of birth, location, preferences, purchases, etc. For them, we rely on the privacy properties of generative models and Differential Privacy to protect them.

Rules of thumb:

  • Do not label columns with high cardinality (many different distinct values) such as names, IDs, etc. as Category Type
  • Do not label numerical columns containing special information such as bank account numbers or social security numbers, etc. as Integer Type

Resources