ID Settings

Available ID settings are listed below. These are used by ID Type , ID Handler, ID Mapper Handler and Text Category Handler.

Standard IDs

These can be used in place of IdSettingsUnion, and IdHandlerSettingsUnion type. These are used by ID Type, ID Handler, ID Mapper Handler and Text Category Handler.

Classes:

NumericalIdSettings

Integer IDs.

IncrementalIdSettings

IDs that begin at a particular number and increase with a constant increment.

RegexIdSettings

IDs based on a provided regular expression pattern.

StaticIdSettings

Uses a single static value for every row in the column.

SortCodeIdSettings

Banking sort codes for UK/Republic of Ireland -based banks.

CPFIdSettings

The CPF (Cadastro de Pessoa Física) is a registration document for the Receita Federal (Brazilian Tax Authority).

UUIDSettings

Standard UUIDs of the form 876b8169-1eae-4741-a459-6d896d705673.

CreditCardNumberIdSettings

Credit Card Number, based on selected card provider.

BankCountrySettings

Bank country name sampled from the specified set of locales.

BBANSettings

Basic Bank Account Number (BBAN), represents a country-specific bank account number.

CompanySettings

Generates randomly created company names.

CPRSettings

The CPR number is unique to the person and is used in Denmark as an ID number.

CreditCardSecurityCodeSettings

Credit card security code (CSC) is usually a three-digit number, not part of the credit card number.

IBANSettings

International Bank Account Number (IBAN).

LicensePlateSettings

Car License plate number from the specified locales.

PasswordSettings

Plain text passwords, conforming to common requirements.

PassportSettings

Passport ID number.

SSNSettings

Social Security Numbers from the chosen locales.

SWIFTSettings

Banking SWIFT code, representing a bank branch.

SWIFT11Settings

Banking SWIFT code with a length of 11 characters.

SWIFT8Settings

Banking SWIFT code with a length of 8 characters.

MD5Settings

MD5 hash.

PhoneNumberSettings

Phone number, formatted by locale.

ColumnSettings

Samples values from another specified column.

OptionSettings

Sample from a list of provided values.

NameSettings

Names sampled from locales.

SplitSettings

Split a column into separate components that can either be modelled as categoricals or replaced with a regex pattern.

class hazy_configurator.settings.id_settings.normal_id_settings.NumericalIdSettings

Bases: UniqueIdSettings

Integer IDs.

Fields:
field length: int [Required]

The number of digits in the numerical id, max 20

field as_str: bool = True

If True generated IDs are returned as strings and generated IDs are 0-padded to ensure that length is verified.

field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

class hazy_configurator.settings.id_settings.normal_id_settings.IncrementalIdSettings

Bases: IdSettings

IDs that begin at a particular number and increase with a constant increment.

Fields:
field start: int = 1

The number to start the sequence on, Must be a positive integer.

Constraints:
  • minimum = 0

field increment: int = 1

The increase between numbers in the sequence. Must be a positive integer

Constraints:
  • exclusiveMinimum = 0

field as_str: bool = False

If True generated IDs are returned as strings

class hazy_configurator.settings.id_settings.normal_id_settings.RegexIdSettings

Bases: UniqueIdSettings

IDs based on a provided regular expression pattern.

Fields:
field pattern: str [Required]

Regex pattern the id will conform to on generation. Escape special regex characters with a single backslash.

field weighted_sampling: bool = False

Determines the sampling strategy when encountering alternations in provided regex patterns. If set to False (default), values will be randomly sampled from all branches without considering their likelihood due to cardinality. This leads to nearly equal sampling across each branch. If switched to True, the function will sample from each branch based on their cardinality, ensuring that choices with higher cardinality contribute more samples.

field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

class hazy_configurator.settings.id_settings.normal_id_settings.StaticIdSettings

Bases: IdSettings

Uses a single static value for every row in the column.

Fields:
field value: Union[None, StrictInt, StrictFloat, bool, datetime, timedelta, str] = None

Single value to set for the entire column

class hazy_configurator.settings.id_settings.normal_id_settings.SortCodeIdSettings

Bases: IdSettings

Banking sort codes for UK/Republic of Ireland -based banks.

List of Available banks

Fields:
field banks: List[str] = []

Set of banks to produce sort codes for

field sep: str = ''

Separator to use between groups in the sort code

class hazy_configurator.settings.id_settings.normal_id_settings.CPFIdSettings

Bases: UniqueIdSettings

The CPF (Cadastro de Pessoa Física) is a registration document for the Receita Federal (Brazilian Tax Authority).

It allows Brazilian citizens to be registered as a tax contributor.

The format is xxx.xxx.xxx-xx and the separator param only replaces the “-“.

Fields:
field sep: str = '-'

Separator to use between two sections of the CPF number

field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

class hazy_configurator.settings.id_settings.normal_id_settings.UUIDSettings

Bases: UniqueIdSettings

Standard UUIDs of the form 876b8169-1eae-4741-a459-6d896d705673.

Fields:
field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

class hazy_configurator.settings.id_settings.normal_id_settings.CreditCardNumberIdSettings

Bases: UniqueIdSettings

Credit Card Number, based on selected card provider.

Fields:
field card_types: List[SupportedCardTypes] = None

Credit Card numbers will match these card types

field infer_from_data: bool = True

When set to True the card types of the credit card numbers will be be detected from the data and modelled. This parameter cannot be True when card_types are set.If this parameter is True but no valid card types are found in the target data, a default [SupportedCardTypes.visa16, SupportedCardTypes.mastercard] will be used.

field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

class hazy_configurator.settings.id_settings.normal_id_settings.BankCountrySettings

Bases: IdSettings

Bank country name sampled from the specified set of locales.

Fields:
field locales: List[str] = ['en_GB']

The locales to sample data from, options are: [‘ar_AA’, ‘ar_AE’, ‘ar_BH’, ‘ar_EG’, ‘ar_JO’, ‘ar_PS’, ‘ar_SA’, ‘az_AZ’, ‘bg_BG’, ‘bn_BD’, ‘bs_BA’, ‘cs_CZ’, ‘da_DK’, ‘de’, ‘de_AT’, ‘de_CH’, ‘de_DE’, ‘dk_DK’, ‘el_CY’, ‘el_GR’, ‘en’, ‘en_AU’, ‘en_CA’, ‘en_GB’, ‘en_IE’, ‘en_IN’, ‘en_NZ’, ‘en_PH’, ‘en_TH’, ‘en_US’, ‘es’, ‘es_CA’, ‘es_CL’, ‘es_CO’, ‘es_ES’, ‘es_MX’, ‘et_EE’, ‘fa_IR’, ‘fi_FI’, ‘fil_PH’, ‘fr_CA’, ‘fr_CH’, ‘fr_FR’, ‘fr_QC’, ‘ga_IE’, ‘he_IL’, ‘hi_IN’, ‘hr_HR’, ‘hu_HU’, ‘hy_AM’, ‘id_ID’, ‘it_CH’, ‘it_IT’, ‘ja_JP’, ‘ka_GE’, ‘ko_KR’, ‘la’, ‘lb_LU’, ‘lt_LT’, ‘lv_LV’, ‘mt_MT’, ‘ne_NP’, ‘nl_BE’, ‘nl_NL’, ‘no_NO’, ‘or_IN’, ‘pl_PL’, ‘pt_BR’, ‘pt_PT’, ‘ro_RO’, ‘ru_RU’, ‘sk_SK’, ‘sl_SI’, ‘sq_AL’, ‘sv_SE’, ‘ta_IN’, ‘th’, ‘th_TH’, ‘tl_PH’, ‘tr_TR’, ‘tw_GH’, ‘uk_UA’, ‘vi_VN’, ‘zh_CN’, ‘zh_TW’]

class hazy_configurator.settings.id_settings.normal_id_settings.BBANSettings

Bases: UniqueIdSettings

Basic Bank Account Number (BBAN), represents a country-specific bank account number.

Fields:
field locales: List[IbanLocales] = [<IbanLocales.GB: 'GB'>]

The locales to sample data from, options are: [‘AD’, ‘AE’, ‘AL’, ‘AT’, ‘AZ’, ‘BA’, ‘BE’, ‘BG’, ‘BH’, ‘BI’, ‘BR’, ‘BY’, ‘CH’, ‘CR’, ‘CY’, ‘CZ’, ‘DE’, ‘DJ’, ‘DK’, ‘DO’, ‘EE’, ‘EG’, ‘ES’, ‘FI’, ‘FO’, ‘FR’, ‘GB’, ‘GE’, ‘GI’, ‘GL’, ‘GR’, ‘GT’, ‘HR’, ‘HU’, ‘IE’, ‘IL’, ‘IQ’, ‘IS’, ‘IT’, ‘JO’, ‘KW’, ‘KZ’, ‘LB’, ‘LC’, ‘LI’, ‘LT’, ‘LU’, ‘LV’, ‘LY’, ‘MC’, ‘MD’, ‘ME’, ‘MK’, ‘MR’, ‘MT’, ‘MU’, ‘NL’, ‘NO’, ‘PK’, ‘PL’, ‘PS’, ‘PT’, ‘QA’, ‘RO’, ‘RS’, ‘RU’, ‘SA’, ‘SC’, ‘SD’, ‘SE’, ‘SI’, ‘SK’, ‘SM’, ‘SO’, ‘ST’, ‘SV’, ‘TL’, ‘TN’, ‘TR’, ‘UA’, ‘VA’, ‘VG’, ‘XK’]

field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

class hazy_configurator.settings.id_settings.normal_id_settings.CompanySettings

Bases: IdSettings

Generates randomly created company names.

Fields:
field locales: List[str] = ['en_GB']

The locales to sample data from, options are: [‘ar_AA’, ‘ar_PS’, ‘ar_SA’, ‘az_AZ’, ‘bg_BG’, ‘bn_BD’, ‘cs_CZ’, ‘da_DK’, ‘de_AT’, ‘de_CH’, ‘de_DE’, ‘el_GR’, ‘en_GB’, ‘en_IE’, ‘en_NZ’, ‘en_US’, ‘es_CA’, ‘es_CL’, ‘es_CO’, ‘es_ES’, ‘et_EE’, ‘fa_IR’, ‘fi_FI’, ‘fr_CA’, ‘fr_CH’, ‘fr_FR’, ‘fr_QC’, ‘ga_IE’, ‘he_IL’, ‘hr_HR’, ‘hu_HU’, ‘hy_AM’, ‘id_ID’, ‘it_IT’, ‘ja_JP’, ‘ka_GE’, ‘ko_KR’, ‘ne_NP’, ‘nl_NL’, ‘no_NO’, ‘or_IN’, ‘pl_PL’, ‘pt_BR’, ‘pt_PT’, ‘ro_RO’, ‘ru_RU’, ‘sl_SI’, ‘sv_SE’, ‘ta_IN’, ‘th_TH’, ‘tr_TR’, ‘tw_GH’, ‘uk_UA’, ‘zh_CN’, ‘zh_TW’]

class hazy_configurator.settings.id_settings.normal_id_settings.CPRSettings

Bases: UniqueIdSettings

The CPR number is unique to the person and is used in Denmark as an ID number.

The format is DDMMYY-SSSS and the separator param only replaces the “-“. The first section is the individuals date of birth. The second section is a 4 digits number of which the last digit is odd for males and even for females.

Standard Examples

from hazy_configurator import CPRSettings

CPRSettings(
    dob_column="dob",
    gender_column="gender"
)

Cross-table Example

In the following example, the dob_column and gender_column exist in a separate table to that of the target column.”

from hazy_configurator import CPRSettings, ColId

CPRSettings(
    dob_column=ColId(col="dob", table="table2"),
    gender_column=ColId(col="gender", table="table2"),
)
Fields:
field dob_column: Union[None, str, ColId] = None

date-of-birth column

field gender_column: Union[None, str, ColId] = None

Gender column.

field gender_map: Optional[Dict[str, Literal['m', 'f', 'o']]] = None

Mapping of gender categories. Each gender category should be a key in the dictionary with the values being one from a selection of ‘m’, ‘f’, ‘o’, which correspond to male, female and other.

field sep: str = '-'

Separator to use between two sections of the CPR number

field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

class hazy_configurator.settings.id_settings.normal_id_settings.CreditCardSecurityCodeSettings

Bases: IdSettings

Credit card security code (CSC) is usually a three-digit number, not part of the credit card number.

Unique generation is not an option since the length is typically small and repeats are likely.

Fields:
field card_types: List[SupportedCardTypes] = [<SupportedCardTypes.visa16: 'visa16'>, <SupportedCardTypes.mastercard: 'mastercard'>]

Credit Card numbers will match these card types

class hazy_configurator.settings.id_settings.normal_id_settings.IBANSettings

Bases: UniqueIdSettings

International Bank Account Number (IBAN).

Fields:
field locales: List[IbanLocales] = [<IbanLocales.GB: 'GB'>]

The locales to sample data from, options are: [‘AD’, ‘AE’, ‘AL’, ‘AT’, ‘AZ’, ‘BA’, ‘BE’, ‘BG’, ‘BH’, ‘BI’, ‘BR’, ‘BY’, ‘CH’, ‘CR’, ‘CY’, ‘CZ’, ‘DE’, ‘DJ’, ‘DK’, ‘DO’, ‘EE’, ‘EG’, ‘ES’, ‘FI’, ‘FO’, ‘FR’, ‘GB’, ‘GE’, ‘GI’, ‘GL’, ‘GR’, ‘GT’, ‘HR’, ‘HU’, ‘IE’, ‘IL’, ‘IQ’, ‘IS’, ‘IT’, ‘JO’, ‘KW’, ‘KZ’, ‘LB’, ‘LC’, ‘LI’, ‘LT’, ‘LU’, ‘LV’, ‘LY’, ‘MC’, ‘MD’, ‘ME’, ‘MK’, ‘MR’, ‘MT’, ‘MU’, ‘NL’, ‘NO’, ‘PK’, ‘PL’, ‘PS’, ‘PT’, ‘QA’, ‘RO’, ‘RS’, ‘RU’, ‘SA’, ‘SC’, ‘SD’, ‘SE’, ‘SI’, ‘SK’, ‘SM’, ‘SO’, ‘ST’, ‘SV’, ‘TL’, ‘TN’, ‘TR’, ‘UA’, ‘VA’, ‘VG’, ‘XK’]

field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

class hazy_configurator.settings.id_settings.normal_id_settings.LicensePlateSettings

Bases: UniqueIdSettings

Car License plate number from the specified locales.

Fields:
field locales: List[str] = ['en_GB']

The locales to sample data from, options are: [‘ar_AA’, ‘ar_AE’, ‘ar_BH’, ‘ar_EG’, ‘ar_JO’, ‘ar_PS’, ‘ar_SA’, ‘az_AZ’, ‘bg_BG’, ‘bn_BD’, ‘bs_BA’, ‘cs_CZ’, ‘da_DK’, ‘de’, ‘de_AT’, ‘de_CH’, ‘de_DE’, ‘dk_DK’, ‘el_CY’, ‘el_GR’, ‘en’, ‘en_AU’, ‘en_CA’, ‘en_GB’, ‘en_IE’, ‘en_IN’, ‘en_NZ’, ‘en_PH’, ‘en_TH’, ‘en_US’, ‘es’, ‘es_CA’, ‘es_CL’, ‘es_CO’, ‘es_ES’, ‘es_MX’, ‘et_EE’, ‘fa_IR’, ‘fi_FI’, ‘fil_PH’, ‘fr_CA’, ‘fr_CH’, ‘fr_FR’, ‘fr_QC’, ‘ga_IE’, ‘he_IL’, ‘hi_IN’, ‘hr_HR’, ‘hu_HU’, ‘hy_AM’, ‘id_ID’, ‘it_CH’, ‘it_IT’, ‘ja_JP’, ‘ka_GE’, ‘ko_KR’, ‘la’, ‘lb_LU’, ‘lt_LT’, ‘lv_LV’, ‘mt_MT’, ‘ne_NP’, ‘nl_BE’, ‘nl_NL’, ‘no_NO’, ‘or_IN’, ‘pl_PL’, ‘pt_BR’, ‘pt_PT’, ‘ro_RO’, ‘ru_RU’, ‘sk_SK’, ‘sl_SI’, ‘sq_AL’, ‘sv_SE’, ‘ta_IN’, ‘th’, ‘th_TH’, ‘tl_PH’, ‘tr_TR’, ‘tw_GH’, ‘uk_UA’, ‘vi_VN’, ‘zh_CN’, ‘zh_TW’]

field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

class hazy_configurator.settings.id_settings.normal_id_settings.PasswordSettings

Bases: IdSettings

Plain text passwords, conforming to common requirements.

Fields:
field length: int = 10

Password length

field special_chars: bool = True

When set to True ensures that special characters are included in generated passwords.

field digits: bool = True

When set to True ensures that digits are included in generated passwords.

field upper_case: bool = True

When set to True ensures that upper case letters are included in generated passwords.

field lower_case: bool = True

When set to True ensures that lower case letters are included in generated passwords.

class hazy_configurator.settings.id_settings.normal_id_settings.PassportSettings

Bases: UniqueIdSettings

Passport ID number.

Standard Example

from hazy_configurator import PassportSettings

PassportSettings(
    country_column="country",
)

Cross-table Example

In the following example, the country_column exists in a separate table to that of the target column.”

from hazy_configurator import PassportSettings, ColId

PassportSettings(
    country_column=ColId(col="country", table="table2")
)
Fields:
field country_column: Union[ColId, str] = None

Country column.

field country_map: Dict[str, str] = None

Dictionary mapping each value within the country_column to a 2-letter country code.

field countries: List[PassportCountries] = [<PassportCountries.GB: 'GB'>]

The list of countries from which to sample when generating passport numbers.

field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

class hazy_configurator.settings.id_settings.normal_id_settings.SSNSettings

Bases: UniqueIdSettings

Social Security Numbers from the chosen locales.

Returns US Social Security numbers if locale is “en_US” Returns National insurance number if locale is “en_GB” Returns VAT identification number if locale is “de_DE”

For other social security numbers, talk to Hazy about other options.

Fields:
field locales: List[SSNLocales] = [<SSNLocales.en_GB: 'en_GB'>]

The locales to sample data from, options are: [‘ar_AA’, ‘ar_AE’, ‘ar_BH’, ‘ar_EG’, ‘ar_JO’, ‘ar_PS’, ‘ar_SA’, ‘az_AZ’, ‘bg_BG’, ‘bn_BD’, ‘bs_BA’, ‘cs_CZ’, ‘da_DK’, ‘de’, ‘de_AT’, ‘de_CH’, ‘de_DE’, ‘dk_DK’, ‘el_CY’, ‘el_GR’, ‘en’, ‘en_AU’, ‘en_CA’, ‘en_GB’, ‘en_IE’, ‘en_IN’, ‘en_NZ’, ‘en_PH’, ‘en_TH’, ‘en_US’, ‘es’, ‘es_CA’, ‘es_CL’, ‘es_CO’, ‘es_ES’, ‘es_MX’, ‘et_EE’, ‘fa_IR’, ‘fi_FI’, ‘fil_PH’, ‘fr_CA’, ‘fr_CH’, ‘fr_FR’, ‘fr_QC’, ‘ga_IE’, ‘he_IL’, ‘hi_IN’, ‘hr_HR’, ‘hu_HU’, ‘hy_AM’, ‘id_ID’, ‘it_CH’, ‘it_IT’, ‘ja_JP’, ‘ka_GE’, ‘ko_KR’, ‘la’, ‘lb_LU’, ‘lt_LT’, ‘lv_LV’, ‘mt_MT’, ‘ne_NP’, ‘nl_BE’, ‘nl_NL’, ‘no_NO’, ‘or_IN’, ‘pl_PL’, ‘pt_BR’, ‘pt_PT’, ‘ro_RO’, ‘ru_RU’, ‘sk_SK’, ‘sl_SI’, ‘sq_AL’, ‘sv_SE’, ‘ta_IN’, ‘th’, ‘th_TH’, ‘tl_PH’, ‘tr_TR’, ‘tw_GH’, ‘uk_UA’, ‘vi_VN’, ‘zh_CN’, ‘zh_TW’]

field dob_column: Union[None, str, ColId] = None

Date of birth column. Used for LT and ET locales only.

field start_dob: str = '1900-01-01'

Start date of birth for sampling when no DoB column has been provided. Used for LT and ET locales only. Must be in ‘%Y-%m-%d’ format.

field end_dob: str = '2024-01-01'

End date of birth for sampling when no DoB column has been provided. Used for LT and ET locales only. Must be in ‘%Y-%m-%d’ format.

field gender_column: Union[None, str, ColId] = None

Column containing gender values. Used for LT and ET locales only.

field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

class hazy_configurator.settings.id_settings.normal_id_settings.SWIFTSettings

Bases: UniqueIdSettings

Banking SWIFT code, representing a bank branch.

Fields:
field primary: bool = False

When set to True the 11-digit SWIFT codes generated will always end in XXX to denote that they belong to primary branches/offices.

field infer_from_data: bool = True

When set to True the type of swift codes and locales will be detected from the data and modelled. This parameter cannot be True when both swift_type and locales are set.If this parameter is True but no valid swift codes are found in the target data, a default locale of GB and default swift types of [SwiftTypes.SWIFT8, SwiftTypes.SWIFT11] will be used.

field swift_types: Optional[List[SwiftTypes]] = None

The types of SWIFT code to generate: [‘SWIFT8’, ‘SWIFT11’]

field locales: Optional[List[SwiftLocales]] = None

The locales to sample data from, options are: [‘AD’, ‘AE’, ‘AG’, ‘AI’, ‘AL’, ‘AM’, ‘AO’, ‘AR’, ‘AS’, ‘AT’, ‘AU’, ‘AW’, ‘AZ’, ‘BA’, ‘BB’, ‘BD’, ‘BE’, ‘BF’, ‘BG’, ‘BH’, ‘BJ’, ‘BM’, ‘BN’, ‘BO’, ‘BQ’, ‘BR’, ‘BS’, ‘BT’, ‘BW’, ‘BY’, ‘BZ’, ‘CA’, ‘CH’, ‘CI’, ‘CK’, ‘CL’, ‘CM’, ‘CN’, ‘CO’, ‘CR’, ‘CV’, ‘CW’, ‘CY’, ‘CZ’, ‘DE’, ‘DJ’, ‘DK’, ‘DM’, ‘DO’, ‘DZ’, ‘EC’, ‘EE’, ‘EG’, ‘EH’, ‘ES’, ‘ET’, ‘FI’, ‘FJ’, ‘FK’, ‘FO’, ‘FR’, ‘GA’, ‘GB’, ‘GD’, ‘GE’, ‘GG’, ‘GH’, ‘GI’, ‘GL’, ‘GM’, ‘GN’, ‘GP’, ‘GQ’, ‘GR’, ‘GT’, ‘GU’, ‘GW’, ‘GY’, ‘HK’, ‘HN’, ‘HR’, ‘HT’, ‘HU’, ‘ID’, ‘IE’, ‘IL’, ‘IM’, ‘IN’, ‘IS’, ‘IT’, ‘JE’, ‘JM’, ‘JO’, ‘JP’, ‘KE’, ‘KG’, ‘KH’, ‘KI’, ‘KM’, ‘KN’, ‘KR’, ‘KW’, ‘KY’, ‘KZ’, ‘LA’, ‘LB’, ‘LC’, ‘LI’, ‘LK’, ‘LR’, ‘LS’, ‘LT’, ‘LU’, ‘LV’, ‘MA’, ‘MC’, ‘MD’, ‘ME’, ‘MG’, ‘MH’, ‘MK’, ‘ML’, ‘MM’, ‘MN’, ‘MO’, ‘MQ’, ‘MR’, ‘MS’, ‘MT’, ‘MU’, ‘MV’, ‘MW’, ‘MX’, ‘MY’, ‘MZ’, ‘NA’, ‘NC’, ‘NE’, ‘NG’, ‘NI’, ‘NL’, ‘NO’, ‘NP’, ‘NZ’, ‘OM’, ‘PA’, ‘PE’, ‘PF’, ‘PG’, ‘PH’, ‘PK’, ‘PL’, ‘PR’, ‘PS’, ‘PT’, ‘PW’, ‘PY’, ‘QA’, ‘RE’, ‘RO’, ‘RS’, ‘RU’, ‘RW’, ‘SA’, ‘SB’, ‘SC’, ‘SD’, ‘SE’, ‘SG’, ‘SH’, ‘SI’, ‘SK’, ‘SL’, ‘SM’, ‘SN’, ‘SO’, ‘SR’, ‘SS’, ‘ST’, ‘SV’, ‘SX’, ‘SY’, ‘TC’, ‘TG’, ‘TH’, ‘TJ’, ‘TK’, ‘TL’, ‘TM’, ‘TN’, ‘TO’, ‘TR’, ‘TT’, ‘TV’, ‘TW’, ‘TZ’, ‘UA’, ‘UG’, ‘UM’, ‘US’, ‘UY’, ‘UZ’, ‘VE’, ‘VG’, ‘VI’, ‘VN’, ‘VU’, ‘WF’, ‘WS’, ‘XK’, ‘YE’, ‘YT’, ‘ZA’, ‘ZM’, ‘ZW’]

field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

class hazy_configurator.settings.id_settings.normal_id_settings.SWIFT11Settings

Bases: UniqueIdSettings

Banking SWIFT code with a length of 11 characters.

Fields:
field primary: bool = False

When set to True the 11-digit SWIFT codes generated will always end in XXX to denote that they belong to primary branches/offices.

field infer_from_data: bool = True

When set to True the locales will be detected from the data and modelled. This parameter cannot be True when locales is set. If this parameter is True but no valid swift codes are found in the target data, a default locale of GB will be used.

field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

field locales: Optional[List[SwiftLocales]] = None

The locales to sample data from, options are: [‘AD’, ‘AE’, ‘AG’, ‘AI’, ‘AL’, ‘AM’, ‘AO’, ‘AR’, ‘AS’, ‘AT’, ‘AU’, ‘AW’, ‘AZ’, ‘BA’, ‘BB’, ‘BD’, ‘BE’, ‘BF’, ‘BG’, ‘BH’, ‘BJ’, ‘BM’, ‘BN’, ‘BO’, ‘BQ’, ‘BR’, ‘BS’, ‘BT’, ‘BW’, ‘BY’, ‘BZ’, ‘CA’, ‘CH’, ‘CI’, ‘CK’, ‘CL’, ‘CM’, ‘CN’, ‘CO’, ‘CR’, ‘CV’, ‘CW’, ‘CY’, ‘CZ’, ‘DE’, ‘DJ’, ‘DK’, ‘DM’, ‘DO’, ‘DZ’, ‘EC’, ‘EE’, ‘EG’, ‘EH’, ‘ES’, ‘ET’, ‘FI’, ‘FJ’, ‘FK’, ‘FO’, ‘FR’, ‘GA’, ‘GB’, ‘GD’, ‘GE’, ‘GG’, ‘GH’, ‘GI’, ‘GL’, ‘GM’, ‘GN’, ‘GP’, ‘GQ’, ‘GR’, ‘GT’, ‘GU’, ‘GW’, ‘GY’, ‘HK’, ‘HN’, ‘HR’, ‘HT’, ‘HU’, ‘ID’, ‘IE’, ‘IL’, ‘IM’, ‘IN’, ‘IS’, ‘IT’, ‘JE’, ‘JM’, ‘JO’, ‘JP’, ‘KE’, ‘KG’, ‘KH’, ‘KI’, ‘KM’, ‘KN’, ‘KR’, ‘KW’, ‘KY’, ‘KZ’, ‘LA’, ‘LB’, ‘LC’, ‘LI’, ‘LK’, ‘LR’, ‘LS’, ‘LT’, ‘LU’, ‘LV’, ‘MA’, ‘MC’, ‘MD’, ‘ME’, ‘MG’, ‘MH’, ‘MK’, ‘ML’, ‘MM’, ‘MN’, ‘MO’, ‘MQ’, ‘MR’, ‘MS’, ‘MT’, ‘MU’, ‘MV’, ‘MW’, ‘MX’, ‘MY’, ‘MZ’, ‘NA’, ‘NC’, ‘NE’, ‘NG’, ‘NI’, ‘NL’, ‘NO’, ‘NP’, ‘NZ’, ‘OM’, ‘PA’, ‘PE’, ‘PF’, ‘PG’, ‘PH’, ‘PK’, ‘PL’, ‘PR’, ‘PS’, ‘PT’, ‘PW’, ‘PY’, ‘QA’, ‘RE’, ‘RO’, ‘RS’, ‘RU’, ‘RW’, ‘SA’, ‘SB’, ‘SC’, ‘SD’, ‘SE’, ‘SG’, ‘SH’, ‘SI’, ‘SK’, ‘SL’, ‘SM’, ‘SN’, ‘SO’, ‘SR’, ‘SS’, ‘ST’, ‘SV’, ‘SX’, ‘SY’, ‘TC’, ‘TG’, ‘TH’, ‘TJ’, ‘TK’, ‘TL’, ‘TM’, ‘TN’, ‘TO’, ‘TR’, ‘TT’, ‘TV’, ‘TW’, ‘TZ’, ‘UA’, ‘UG’, ‘UM’, ‘US’, ‘UY’, ‘UZ’, ‘VE’, ‘VG’, ‘VI’, ‘VN’, ‘VU’, ‘WF’, ‘WS’, ‘XK’, ‘YE’, ‘YT’, ‘ZA’, ‘ZM’, ‘ZW’]

class hazy_configurator.settings.id_settings.normal_id_settings.SWIFT8Settings

Bases: UniqueIdSettings

Banking SWIFT code with a length of 8 characters.

Fields:
field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

field infer_from_data: bool = True

When set to True the locales will be detected from the data and modelled. This parameter cannot be True when locales is set.If this parameter is True but no valid swift codes are found in the target data, a default locale of GB will be used.

field locales: Optional[List[SwiftLocales]] = None

The locales to sample data from, options are: [‘AD’, ‘AE’, ‘AG’, ‘AI’, ‘AL’, ‘AM’, ‘AO’, ‘AR’, ‘AS’, ‘AT’, ‘AU’, ‘AW’, ‘AZ’, ‘BA’, ‘BB’, ‘BD’, ‘BE’, ‘BF’, ‘BG’, ‘BH’, ‘BJ’, ‘BM’, ‘BN’, ‘BO’, ‘BQ’, ‘BR’, ‘BS’, ‘BT’, ‘BW’, ‘BY’, ‘BZ’, ‘CA’, ‘CH’, ‘CI’, ‘CK’, ‘CL’, ‘CM’, ‘CN’, ‘CO’, ‘CR’, ‘CV’, ‘CW’, ‘CY’, ‘CZ’, ‘DE’, ‘DJ’, ‘DK’, ‘DM’, ‘DO’, ‘DZ’, ‘EC’, ‘EE’, ‘EG’, ‘EH’, ‘ES’, ‘ET’, ‘FI’, ‘FJ’, ‘FK’, ‘FO’, ‘FR’, ‘GA’, ‘GB’, ‘GD’, ‘GE’, ‘GG’, ‘GH’, ‘GI’, ‘GL’, ‘GM’, ‘GN’, ‘GP’, ‘GQ’, ‘GR’, ‘GT’, ‘GU’, ‘GW’, ‘GY’, ‘HK’, ‘HN’, ‘HR’, ‘HT’, ‘HU’, ‘ID’, ‘IE’, ‘IL’, ‘IM’, ‘IN’, ‘IS’, ‘IT’, ‘JE’, ‘JM’, ‘JO’, ‘JP’, ‘KE’, ‘KG’, ‘KH’, ‘KI’, ‘KM’, ‘KN’, ‘KR’, ‘KW’, ‘KY’, ‘KZ’, ‘LA’, ‘LB’, ‘LC’, ‘LI’, ‘LK’, ‘LR’, ‘LS’, ‘LT’, ‘LU’, ‘LV’, ‘MA’, ‘MC’, ‘MD’, ‘ME’, ‘MG’, ‘MH’, ‘MK’, ‘ML’, ‘MM’, ‘MN’, ‘MO’, ‘MQ’, ‘MR’, ‘MS’, ‘MT’, ‘MU’, ‘MV’, ‘MW’, ‘MX’, ‘MY’, ‘MZ’, ‘NA’, ‘NC’, ‘NE’, ‘NG’, ‘NI’, ‘NL’, ‘NO’, ‘NP’, ‘NZ’, ‘OM’, ‘PA’, ‘PE’, ‘PF’, ‘PG’, ‘PH’, ‘PK’, ‘PL’, ‘PR’, ‘PS’, ‘PT’, ‘PW’, ‘PY’, ‘QA’, ‘RE’, ‘RO’, ‘RS’, ‘RU’, ‘RW’, ‘SA’, ‘SB’, ‘SC’, ‘SD’, ‘SE’, ‘SG’, ‘SH’, ‘SI’, ‘SK’, ‘SL’, ‘SM’, ‘SN’, ‘SO’, ‘SR’, ‘SS’, ‘ST’, ‘SV’, ‘SX’, ‘SY’, ‘TC’, ‘TG’, ‘TH’, ‘TJ’, ‘TK’, ‘TL’, ‘TM’, ‘TN’, ‘TO’, ‘TR’, ‘TT’, ‘TV’, ‘TW’, ‘TZ’, ‘UA’, ‘UG’, ‘UM’, ‘US’, ‘UY’, ‘UZ’, ‘VE’, ‘VG’, ‘VI’, ‘VN’, ‘VU’, ‘WF’, ‘WS’, ‘XK’, ‘YE’, ‘YT’, ‘ZA’, ‘ZM’, ‘ZW’]

class hazy_configurator.settings.id_settings.normal_id_settings.MD5Settings

Bases: UniqueIdSettings

MD5 hash.

Fields:
field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

class hazy_configurator.settings.id_settings.normal_id_settings.PhoneNumberSettings

Bases: UniqueIdSettings

Phone number, formatted by locale.

Fields:
field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

field locales: List[str] [Required]

The locales to sample data from, options are: [‘ar_AA’, ‘ar_AE’, ‘ar_BH’, ‘ar_EG’, ‘ar_JO’, ‘ar_PS’, ‘ar_SA’, ‘az_AZ’, ‘bg_BG’, ‘bn_BD’, ‘bs_BA’, ‘cs_CZ’, ‘da_DK’, ‘de’, ‘de_AT’, ‘de_CH’, ‘de_DE’, ‘dk_DK’, ‘el_CY’, ‘el_GR’, ‘en’, ‘en_AU’, ‘en_CA’, ‘en_GB’, ‘en_IE’, ‘en_IN’, ‘en_NZ’, ‘en_TH’, ‘en_US’, ‘es’, ‘es_CA’, ‘es_CL’, ‘es_CO’, ‘es_ES’, ‘es_MX’, ‘et_EE’, ‘fa_IR’, ‘fi_FI’, ‘fr_CA’, ‘fr_CH’, ‘fr_FR’, ‘fr_QC’, ‘ga_IE’, ‘he_IL’, ‘hi_IN’, ‘hr_HR’, ‘hu_HU’, ‘hy_AM’, ‘id_ID’, ‘it_CH’, ‘it_IT’, ‘ja_JP’, ‘ka_GE’, ‘ko_KR’, ‘la’, ‘lb_LU’, ‘lt_LT’, ‘lv_LV’, ‘mt_MT’, ‘ne_NP’, ‘nl_BE’, ‘nl_NL’, ‘no_NO’, ‘or_IN’, ‘pl_PL’, ‘pt_BR’, ‘pt_PT’, ‘ro_RO’, ‘ru_RU’, ‘sk_SK’, ‘sl_SI’, ‘sq_AL’, ‘sv_SE’, ‘ta_IN’, ‘th’, ‘th_TH’, ‘tr_TR’, ‘tw_GH’, ‘uk_UA’, ‘vi_VN’, ‘zh_CN’, ‘zh_TW’]

class hazy_configurator.settings.id_settings.normal_id_settings.ColumnSettings

Bases: IdSettings

Samples values from another specified column.

Fields:
field column: Union[ColId, str] [Required]

Column to sample from.

field random_sample: bool = False

When True, values from the column will be randomly sampled.

class hazy_configurator.settings.id_settings.normal_id_settings.OptionSettings

Bases: IdSettings

Sample from a list of provided values. It is usually used within a composite pattern.

Fields:
field options: List[Union[str, datetime, timedelta, bool, int, float]] [Required]

Options to sample from.

field dist: Optional[List[float]] = None

Distribution to follow while sampling from options. The length must match options length and the values must sum to 1.0

Constraints:
  • minimum = 0

class hazy_configurator.settings.id_settings.normal_id_settings.NameSettings

Bases: UniqueIdSettings

Names sampled from locales.

Fields:
field unique: bool = True

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

field name_type: NameIDTypes = NameIDTypes.FIRST_NAME

Type of name to be sampled. The options are [‘full_name’, ‘full_name_female’, ‘full_name_male’, ‘first_name’, ‘first_name_female’, ‘first_name_male’, ‘last_name’]

field locales: List[PersonLocales] = [<PersonLocales.en_GB: 'en_GB'>]

The locales to sample data from, options are: [‘ar_AA’, ‘ar_PS’, ‘ar_SA’, ‘az_AZ’, ‘bg_BG’, ‘bn_BD’, ‘cs_CZ’, ‘da_DK’, ‘de_AT’, ‘de_CH’, ‘de_DE’, ‘el_GR’, ‘en_GB’, ‘en_IE’, ‘en_NZ’, ‘en_US’, ‘es_CA’, ‘es_CL’, ‘es_CO’, ‘es_ES’, ‘et_EE’, ‘fa_IR’, ‘fi_FI’, ‘fr_CA’, ‘fr_CH’, ‘fr_FR’, ‘fr_QC’, ‘ga_IE’, ‘he_IL’, ‘hr_HR’, ‘hu_HU’, ‘hy_AM’, ‘id_ID’, ‘it_IT’, ‘ja_JP’, ‘ka_GE’, ‘ko_KR’, ‘ne_NP’, ‘nl_NL’, ‘no_NO’, ‘or_IN’, ‘pl_PL’, ‘pt_BR’, ‘pt_PT’, ‘ro_RO’, ‘ru_RU’, ‘sl_SI’, ‘sv_SE’, ‘ta_IN’, ‘th_TH’, ‘tr_TR’, ‘tw_GH’, ‘uk_UA’, ‘zh_CN’, ‘zh_TW’]

class hazy_configurator.settings.id_settings.normal_id_settings.SplitSettings

Bases: IdSettings

Split a column into separate components that can either be modelled as categoricals or replaced with a regex pattern.

Example

from hazy_configurator import SplitSettings

SplitSettings(
    split_map={
        6: None,
        12: “[0-9]{6}“,
        18: None,
    }
)

The example above will treat the target column in the following way. Characters 1-6 and 13-18 will be modelled as categorical values. Characters 7-12 will be replaced with a selection of characters that match the corresponding regex pattern.

Fields:
field split_map: Dict[int, Optional[str]] [Required]
field unique: bool = False

When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.

field weighted_sampling: bool = False

Determines the sampling strategy when encountering alternations in provided regex patterns. If set to False (default), values will be randomly sampled from all branches without considering their likelihood due to cardinality. This leads to nearly equal sampling across each branch. If switched to True, the function will sample from each branch based on their cardinality, ensuring that choices with higher cardinality contribute more samples.

Real ID

This can be used by the IdHandlerSettingsUnion type only used in ID Handler and ID Mapper Handler.

class hazy_configurator.settings.id_settings.real_id_settings.RealIdSettings

Bases: IdSettings

Reproduce the real ids from the source data.

This should be used with caution and is easier to use via the Real Type.

Fields:
  • id_type (Literal['real'])

Compound ID

This can be used by the IdSettingsUnion, IdHandlerSettingsUnion and TextCategorySettingsUnion types. This can be used by ID Type, ID Handler, ID Mapper Handler and Text Category Handler.

class hazy_configurator.settings.id_settings.compound_settings.CompoundSettings

Bases: UniqueIdSettings

Generate complex ID patterns by providing a pattern alongside a dictionary of sampling behaviours for each component within the pattern.

Examples

from hazy_configurator import CompoundSettings, NumericalIdSettings, NameSettings

settings = CompoundSettings(
    pattern="{IDTYPE1}/{IDTYPE2}",
    id_settings={
        "IDTYPE1": NameSettings(name_type=NameTypes.FIRST_NAME_MALE),
        "IDTYPE2": NumericalIdSettings(length=4),
    },
    unique=True,
)
Fields:
field pattern: str [Required]

The pattern the generated combination of IDs will follow, i.e. “{IDTYPE1}/{IDTYPE2}” will generate two IDs separated by a forward slash.

field id_settings: Dict[str, NormalIdSettingsUnion] [Required]

Dictionary containing the mapping from pattern to samplers and the sampler settings themselves. Each key should correspond to a sampler identifier used in the pattern, with each value being a NormalIdSettingsUnion which can be any of the options in Standard IDs.

field fill_char: str = ''

String to replace null values with.

Mixture ID

This can be used by the IdSettingsUnion and IdHandlerSettingsUnion types. This can be used by ID Type, ID Handler and ID Mapper Handler.

class hazy_configurator.settings.id_settings.id_mixture_settings.IdMixturePatternConfig

Bases: HazyBaseModel

Used to provide a single regular expression and the corresponding ID sampler settings to the IdMixtureSettings configuration.

Fields:
field match: str [Required]

Regular expression to match the text field.

field label: Optional[str] = None

[DEPRECATED] Categorical label for matches to be transformed into.

field case: bool = False

If True, pattern matching is case sensitive.

field sampler: Union[NormalIdSettingsUnion, CompoundSettings] [Required]
class hazy_configurator.settings.id_settings.id_mixture_settings.IdMixtureSettings

Bases: UniqueIdSettings

Used to model multiple ID formats in a single column.

The examples below show how this feature can be used to model two different formats of IDs, one beginning with ‘A’ and one with ‘B’, both followed by 3 digits.

Examples

from hazy_configurator import (
    IdMixtureSettings,
    IdMixturePatternConfig,
    RegexIdSettings,
)

settings = IdMixtureSettings(
    patterns=[
        IdMixturePatternConfig(
            match="A[0-9]{3}",
            sampler=RegexIdSettings(pattern="A[0-9]{3}")
        ),
        IdMixturePatternConfig(
            match="B[0-9]{3}",
            sampler=RegexIdSettings(pattern="B[0-9]{3}")
        )
    ]
)
Fields:
field patterns: List[IdMixturePatternConfig] [Required]

Parameters for matching text and sampling.

field mismatch: IdMismatchBehaviour = IdMismatchBehaviour.REPLACE

Behaviour when there are values that do not match any of the specified conditions. ‘replace’ will replaced unmatched values with other conditions. ‘preserve’ will leave any unmatched values as they are and treat them as categories.

Conditioned ID

This can be used by the IdSettingsUnion and IdHandlerSettingsUnion types. This can be used by ID Type, ID Handler and ID Mapper Handler.

class hazy_configurator.settings.id_settings.conditioned_settings.ConditionedIdCondition

Bases: HazyBaseModel

Fields:
  • dependencies (List[Union[hazy_configurator.base.col_identifier.ColId, str]])

  • query (str)

  • sampler (Union[NormalIdSettingsUnion, hazy_configurator.settings.id_settings.compound_settings.CompoundSettings, hazy_configurator.settings.id_settings.id_mixture_settings.IdMixtureSettings])

class hazy_configurator.settings.id_settings.conditioned_settings.ConditionedIdSettings

Bases: UniqueIdSettings

Generate complex ID patterns by providing a list of queries alongisde behaviours for each component within the pattern.

Examples

from hazy_configurator import (
    ConditionedIdSettings,
    ConditionedIdCondition,
    NumericalIdSettings,
    ColId
)

ConditionedIdSettings(
    mismatch='replace',
    unique=True,
    conditions=[
        ConditionedIdCondition(
            query="`('table1', 'col1')` == 'A'",
            dependencies=[ColId(col="col1", table="table1")],
            sampler=NumericalIdSettings(length=5)
        ),
        ConditionedIdCondition(
            query="`('table1', 'col1')` == 'B'",
            dependencies=[ColId(col="col1", table="table1")],
            sampler=NumericalIdSettings(length=6)
        )
    ]
)
Fields:
field id_type: Literal['conditioned'] = 'conditioned'
field conditions: List[ConditionedIdCondition] [Required]

List of `ConditionedIdCondition`s, used to specify the type of ID to be generated when a particular query matches.

field mismatch: IdMismatchBehaviour = IdMismatchBehaviour.REPLACE

Behaviour when there are values that do not match any of the specified conditions. ‘replace’ will replace unmatched values with other conditions. ‘preserve’ will leave any unmatched values as they are and treat them as categories.