ID Settings¶
Available ID settings are listed below. These are used by ID Type , ID Handler, ID Mapper Handler and Text Category Handler.
Standard IDs¶
These can be used in place of IdSettingsUnion
, and IdHandlerSettingsUnion
type.
These are used by ID Type, ID Handler,
ID Mapper Handler and Text Category Handler.
Classes:
Integer IDs. |
|
IDs that begin at a particular number and increase with a constant increment. |
|
IDs based on a provided regular expression pattern. |
|
Uses a single static value for every row in the column. |
|
Banking sort codes for UK/Republic of Ireland -based banks. |
|
The CPF (Cadastro de Pessoa Física) is a registration document for the Receita Federal (Brazilian Tax Authority). |
|
Standard UUIDs of the form 876b8169-1eae-4741-a459-6d896d705673. |
|
Credit Card Number, based on selected card provider. |
|
Bank country name sampled from the specified set of locales. |
|
Basic Bank Account Number (BBAN), represents a country-specific bank account number. |
|
Generates randomly created company names. |
|
The CPR number is unique to the person and is used in Denmark as an ID number. |
|
Credit card security code (CSC) is usually a three-digit number, not part of the credit card number. |
|
International Bank Account Number (IBAN). |
|
Car License plate number from the specified locales. |
|
Plain text passwords, conforming to common requirements. |
|
Passport ID number. |
|
Social Security Numbers from the chosen locales. |
|
Banking SWIFT code, representing a bank branch. |
|
Banking SWIFT code with a length of 11 characters. |
|
Banking SWIFT code with a length of 8 characters. |
|
MD5 hash. |
|
Phone number, formatted by locale. |
|
Samples values from another specified column. |
|
Sample from a list of provided values. |
|
Names sampled from locales. |
|
Split a column into separate components that can either be modelled as categoricals or replaced with a regex pattern. |
- class hazy_configurator.settings.id_settings.normal_id_settings.NumericalIdSettings¶
Bases:
UniqueIdSettings
Integer IDs.
- Fields:
id_type (Literal['numerical'])
- class hazy_configurator.settings.id_settings.normal_id_settings.IncrementalIdSettings¶
Bases:
IdSettings
IDs that begin at a particular number and increase with a constant increment.
- Fields:
id_type (Literal['incremental'])
- field start: int = 1¶
The number to start the sequence on, Must be a positive integer.
- Constraints:
minimum = 0
- class hazy_configurator.settings.id_settings.normal_id_settings.RegexIdSettings¶
Bases:
UniqueIdSettings
IDs based on a provided regular expression pattern.
- Fields:
id_type (Literal['regex'])
- field pattern: str [Required]¶
Regex pattern the id will conform to on generation. Escape special regex characters with a single backslash.
- field weighted_sampling: bool = False¶
Determines the sampling strategy when encountering alternations in provided regex patterns. If set to False (default), values will be randomly sampled from all branches without considering their likelihood due to cardinality. This leads to nearly equal sampling across each branch. If switched to True, the function will sample from each branch based on their cardinality, ensuring that choices with higher cardinality contribute more samples.
- class hazy_configurator.settings.id_settings.normal_id_settings.StaticIdSettings¶
Bases:
IdSettings
Uses a single static value for every row in the column.
- Fields:
- class hazy_configurator.settings.id_settings.normal_id_settings.SortCodeIdSettings¶
Bases:
IdSettings
Banking sort codes for UK/Republic of Ireland -based banks.
List of Available banks
- Fields:
id_type (Literal['sortcode'])
- class hazy_configurator.settings.id_settings.normal_id_settings.CPFIdSettings¶
Bases:
UniqueIdSettings
The CPF (Cadastro de Pessoa Física) is a registration document for the Receita Federal (Brazilian Tax Authority).
It allows Brazilian citizens to be registered as a tax contributor.
The format is xxx.xxx.xxx-xx and the separator param only replaces the “-“.
- Fields:
id_type (Literal['cpf'])
- class hazy_configurator.settings.id_settings.normal_id_settings.UUIDSettings¶
Bases:
UniqueIdSettings
Standard UUIDs of the form 876b8169-1eae-4741-a459-6d896d705673.
- Fields:
id_type (Literal['uuid'])
- class hazy_configurator.settings.id_settings.normal_id_settings.CreditCardNumberIdSettings¶
Bases:
UniqueIdSettings
Credit Card Number, based on selected card provider.
- Fields:
card_types (List[hazy_configurator.base.enums.SupportedCardTypes])
id_type (Literal['credit_card_number'])
- field card_types: List[SupportedCardTypes] = None¶
Credit Card numbers will match these card types
- field infer_from_data: bool = True¶
When set to True the card types of the credit card numbers will be be detected from the data and modelled. This parameter cannot be True when card_types are set.If this parameter is True but no valid card types are found in the target data, a default [SupportedCardTypes.visa16, SupportedCardTypes.mastercard] will be used.
- class hazy_configurator.settings.id_settings.normal_id_settings.BankCountrySettings¶
Bases:
IdSettings
Bank country name sampled from the specified set of locales.
- Fields:
id_type (Literal['bank_country'])
- field locales: List[str] = ['en_GB']¶
The locales to sample data from, options are: [‘ar_AA’, ‘ar_AE’, ‘ar_BH’, ‘ar_EG’, ‘ar_JO’, ‘ar_PS’, ‘ar_SA’, ‘az_AZ’, ‘bg_BG’, ‘bn_BD’, ‘bs_BA’, ‘cs_CZ’, ‘da_DK’, ‘de’, ‘de_AT’, ‘de_CH’, ‘de_DE’, ‘dk_DK’, ‘el_CY’, ‘el_GR’, ‘en’, ‘en_AU’, ‘en_CA’, ‘en_GB’, ‘en_IE’, ‘en_IN’, ‘en_NZ’, ‘en_PH’, ‘en_TH’, ‘en_US’, ‘es’, ‘es_CA’, ‘es_CL’, ‘es_CO’, ‘es_ES’, ‘es_MX’, ‘et_EE’, ‘fa_IR’, ‘fi_FI’, ‘fil_PH’, ‘fr_CA’, ‘fr_CH’, ‘fr_FR’, ‘fr_QC’, ‘ga_IE’, ‘he_IL’, ‘hi_IN’, ‘hr_HR’, ‘hu_HU’, ‘hy_AM’, ‘id_ID’, ‘it_CH’, ‘it_IT’, ‘ja_JP’, ‘ka_GE’, ‘ko_KR’, ‘la’, ‘lb_LU’, ‘lt_LT’, ‘lv_LV’, ‘mt_MT’, ‘ne_NP’, ‘nl_BE’, ‘nl_NL’, ‘no_NO’, ‘or_IN’, ‘pl_PL’, ‘pt_BR’, ‘pt_PT’, ‘ro_RO’, ‘ru_RU’, ‘sk_SK’, ‘sl_SI’, ‘sq_AL’, ‘sv_SE’, ‘ta_IN’, ‘th’, ‘th_TH’, ‘tl_PH’, ‘tr_TR’, ‘tw_GH’, ‘uk_UA’, ‘vi_VN’, ‘zh_CN’, ‘zh_TW’]
- class hazy_configurator.settings.id_settings.normal_id_settings.BBANSettings¶
Bases:
UniqueIdSettings
Basic Bank Account Number (BBAN), represents a country-specific bank account number.
- Fields:
id_type (Literal['bban'])
- field locales: List[IbanLocales] = [<IbanLocales.GB: 'GB'>]¶
The locales to sample data from, options are: [‘AD’, ‘AE’, ‘AL’, ‘AT’, ‘AZ’, ‘BA’, ‘BE’, ‘BG’, ‘BH’, ‘BI’, ‘BR’, ‘BY’, ‘CH’, ‘CR’, ‘CY’, ‘CZ’, ‘DE’, ‘DJ’, ‘DK’, ‘DO’, ‘EE’, ‘EG’, ‘ES’, ‘FI’, ‘FO’, ‘FR’, ‘GB’, ‘GE’, ‘GI’, ‘GL’, ‘GR’, ‘GT’, ‘HR’, ‘HU’, ‘IE’, ‘IL’, ‘IQ’, ‘IS’, ‘IT’, ‘JO’, ‘KW’, ‘KZ’, ‘LB’, ‘LC’, ‘LI’, ‘LT’, ‘LU’, ‘LV’, ‘LY’, ‘MC’, ‘MD’, ‘ME’, ‘MK’, ‘MR’, ‘MT’, ‘MU’, ‘NL’, ‘NO’, ‘PK’, ‘PL’, ‘PS’, ‘PT’, ‘QA’, ‘RO’, ‘RS’, ‘RU’, ‘SA’, ‘SC’, ‘SD’, ‘SE’, ‘SI’, ‘SK’, ‘SM’, ‘SO’, ‘ST’, ‘SV’, ‘TL’, ‘TN’, ‘TR’, ‘UA’, ‘VA’, ‘VG’, ‘XK’]
- class hazy_configurator.settings.id_settings.normal_id_settings.CompanySettings¶
Bases:
IdSettings
Generates randomly created company names.
- Fields:
id_type (Literal['company'])
- field locales: List[str] = ['en_GB']¶
The locales to sample data from, options are: [‘ar_AA’, ‘ar_PS’, ‘ar_SA’, ‘az_AZ’, ‘bg_BG’, ‘bn_BD’, ‘cs_CZ’, ‘da_DK’, ‘de_AT’, ‘de_CH’, ‘de_DE’, ‘el_GR’, ‘en_GB’, ‘en_IE’, ‘en_NZ’, ‘en_US’, ‘es_CA’, ‘es_CL’, ‘es_CO’, ‘es_ES’, ‘et_EE’, ‘fa_IR’, ‘fi_FI’, ‘fr_CA’, ‘fr_CH’, ‘fr_FR’, ‘fr_QC’, ‘ga_IE’, ‘he_IL’, ‘hr_HR’, ‘hu_HU’, ‘hy_AM’, ‘id_ID’, ‘it_IT’, ‘ja_JP’, ‘ka_GE’, ‘ko_KR’, ‘ne_NP’, ‘nl_NL’, ‘no_NO’, ‘or_IN’, ‘pl_PL’, ‘pt_BR’, ‘pt_PT’, ‘ro_RO’, ‘ru_RU’, ‘sl_SI’, ‘sv_SE’, ‘ta_IN’, ‘th_TH’, ‘tr_TR’, ‘tw_GH’, ‘uk_UA’, ‘zh_CN’, ‘zh_TW’]
- class hazy_configurator.settings.id_settings.normal_id_settings.CPRSettings¶
Bases:
UniqueIdSettings
The CPR number is unique to the person and is used in Denmark as an ID number.
The format is DDMMYY-SSSS and the separator param only replaces the “-“. The first section is the individuals date of birth. The second section is a 4 digits number of which the last digit is odd for males and even for females.
Standard Examples
from hazy_configurator import CPRSettings CPRSettings( dob_column="dob", gender_column="gender" )
{ "id_type": "cpr", "dob_column": "dob", "gender_column": "gender", }
Cross-table Example
In the following example, the dob_column and gender_column exist in a separate table to that of the target column.”
from hazy_configurator import CPRSettings, ColId CPRSettings( dob_column=ColId(col="dob", table="table2"), gender_column=ColId(col="gender", table="table2"), )
- Fields:
- class hazy_configurator.settings.id_settings.normal_id_settings.CreditCardSecurityCodeSettings¶
Bases:
IdSettings
Credit card security code (CSC) is usually a three-digit number, not part of the credit card number.
Unique generation is not an option since the length is typically small and repeats are likely.
- Fields:
card_types (List[hazy_configurator.base.enums.SupportedCardTypes])
id_type (Literal['credit_card_security_code'])
- field card_types: List[SupportedCardTypes] = [<SupportedCardTypes.visa16: 'visa16'>, <SupportedCardTypes.mastercard: 'mastercard'>]¶
Credit Card numbers will match these card types
- class hazy_configurator.settings.id_settings.normal_id_settings.IBANSettings¶
Bases:
UniqueIdSettings
International Bank Account Number (IBAN).
- Fields:
id_type (Literal['iban'])
- field locales: List[IbanLocales] = [<IbanLocales.GB: 'GB'>]¶
The locales to sample data from, options are: [‘AD’, ‘AE’, ‘AL’, ‘AT’, ‘AZ’, ‘BA’, ‘BE’, ‘BG’, ‘BH’, ‘BI’, ‘BR’, ‘BY’, ‘CH’, ‘CR’, ‘CY’, ‘CZ’, ‘DE’, ‘DJ’, ‘DK’, ‘DO’, ‘EE’, ‘EG’, ‘ES’, ‘FI’, ‘FO’, ‘FR’, ‘GB’, ‘GE’, ‘GI’, ‘GL’, ‘GR’, ‘GT’, ‘HR’, ‘HU’, ‘IE’, ‘IL’, ‘IQ’, ‘IS’, ‘IT’, ‘JO’, ‘KW’, ‘KZ’, ‘LB’, ‘LC’, ‘LI’, ‘LT’, ‘LU’, ‘LV’, ‘LY’, ‘MC’, ‘MD’, ‘ME’, ‘MK’, ‘MR’, ‘MT’, ‘MU’, ‘NL’, ‘NO’, ‘PK’, ‘PL’, ‘PS’, ‘PT’, ‘QA’, ‘RO’, ‘RS’, ‘RU’, ‘SA’, ‘SC’, ‘SD’, ‘SE’, ‘SI’, ‘SK’, ‘SM’, ‘SO’, ‘ST’, ‘SV’, ‘TL’, ‘TN’, ‘TR’, ‘UA’, ‘VA’, ‘VG’, ‘XK’]
- class hazy_configurator.settings.id_settings.normal_id_settings.LicensePlateSettings¶
Bases:
UniqueIdSettings
Car License plate number from the specified locales.
- Fields:
id_type (Literal['license_plate'])
- field locales: List[str] = ['en_GB']¶
The locales to sample data from, options are: [‘ar_AA’, ‘ar_AE’, ‘ar_BH’, ‘ar_EG’, ‘ar_JO’, ‘ar_PS’, ‘ar_SA’, ‘az_AZ’, ‘bg_BG’, ‘bn_BD’, ‘bs_BA’, ‘cs_CZ’, ‘da_DK’, ‘de’, ‘de_AT’, ‘de_CH’, ‘de_DE’, ‘dk_DK’, ‘el_CY’, ‘el_GR’, ‘en’, ‘en_AU’, ‘en_CA’, ‘en_GB’, ‘en_IE’, ‘en_IN’, ‘en_NZ’, ‘en_PH’, ‘en_TH’, ‘en_US’, ‘es’, ‘es_CA’, ‘es_CL’, ‘es_CO’, ‘es_ES’, ‘es_MX’, ‘et_EE’, ‘fa_IR’, ‘fi_FI’, ‘fil_PH’, ‘fr_CA’, ‘fr_CH’, ‘fr_FR’, ‘fr_QC’, ‘ga_IE’, ‘he_IL’, ‘hi_IN’, ‘hr_HR’, ‘hu_HU’, ‘hy_AM’, ‘id_ID’, ‘it_CH’, ‘it_IT’, ‘ja_JP’, ‘ka_GE’, ‘ko_KR’, ‘la’, ‘lb_LU’, ‘lt_LT’, ‘lv_LV’, ‘mt_MT’, ‘ne_NP’, ‘nl_BE’, ‘nl_NL’, ‘no_NO’, ‘or_IN’, ‘pl_PL’, ‘pt_BR’, ‘pt_PT’, ‘ro_RO’, ‘ru_RU’, ‘sk_SK’, ‘sl_SI’, ‘sq_AL’, ‘sv_SE’, ‘ta_IN’, ‘th’, ‘th_TH’, ‘tl_PH’, ‘tr_TR’, ‘tw_GH’, ‘uk_UA’, ‘vi_VN’, ‘zh_CN’, ‘zh_TW’]
- class hazy_configurator.settings.id_settings.normal_id_settings.PasswordSettings¶
Bases:
IdSettings
Plain text passwords, conforming to common requirements.
- Fields:
id_type (Literal['password'])
- field special_chars: bool = True¶
When set to True ensures that special characters are included in generated passwords.
- field digits: bool = True¶
When set to True ensures that digits are included in generated passwords.
- class hazy_configurator.settings.id_settings.normal_id_settings.PassportSettings¶
Bases:
UniqueIdSettings
Passport ID number.
Standard Example
from hazy_configurator import PassportSettings PassportSettings( country_column="country", )
{ "id_type": "passport", "country_column": "country", }
Cross-table Example
In the following example, the country_column exists in a separate table to that of the target column.”
from hazy_configurator import PassportSettings, ColId PassportSettings( country_column=ColId(col="country", table="table2") )
- Fields:
countries (List[hazy_configurator.base.enums.PassportCountries])
country_column (Union[hazy_configurator.base.col_identifier.ColId, str])
id_type (Literal['passport'])
max_iter (int)
- field country_map: Dict[str, str] = None¶
Dictionary mapping each value within the country_column to a 2-letter country code.
- field countries: List[PassportCountries] = [<PassportCountries.GB: 'GB'>]¶
The list of countries from which to sample when generating passport numbers.
- class hazy_configurator.settings.id_settings.normal_id_settings.SSNSettings¶
Bases:
UniqueIdSettings
Social Security Numbers from the chosen locales.
Returns US Social Security numbers if locale is “en_US” Returns National insurance number if locale is “en_GB” Returns VAT identification number if locale is “de_DE”
For other social security numbers, talk to Hazy about other options.
- Fields:
- field locales: List[SSNLocales] = [<SSNLocales.en_GB: 'en_GB'>]¶
The locales to sample data from, options are: [‘ar_AA’, ‘ar_AE’, ‘ar_BH’, ‘ar_EG’, ‘ar_JO’, ‘ar_PS’, ‘ar_SA’, ‘az_AZ’, ‘bg_BG’, ‘bn_BD’, ‘bs_BA’, ‘cs_CZ’, ‘da_DK’, ‘de’, ‘de_AT’, ‘de_CH’, ‘de_DE’, ‘dk_DK’, ‘el_CY’, ‘el_GR’, ‘en’, ‘en_AU’, ‘en_CA’, ‘en_GB’, ‘en_IE’, ‘en_IN’, ‘en_NZ’, ‘en_PH’, ‘en_TH’, ‘en_US’, ‘es’, ‘es_CA’, ‘es_CL’, ‘es_CO’, ‘es_ES’, ‘es_MX’, ‘et_EE’, ‘fa_IR’, ‘fi_FI’, ‘fil_PH’, ‘fr_CA’, ‘fr_CH’, ‘fr_FR’, ‘fr_QC’, ‘ga_IE’, ‘he_IL’, ‘hi_IN’, ‘hr_HR’, ‘hu_HU’, ‘hy_AM’, ‘id_ID’, ‘it_CH’, ‘it_IT’, ‘ja_JP’, ‘ka_GE’, ‘ko_KR’, ‘la’, ‘lb_LU’, ‘lt_LT’, ‘lv_LV’, ‘mt_MT’, ‘ne_NP’, ‘nl_BE’, ‘nl_NL’, ‘no_NO’, ‘or_IN’, ‘pl_PL’, ‘pt_BR’, ‘pt_PT’, ‘ro_RO’, ‘ru_RU’, ‘sk_SK’, ‘sl_SI’, ‘sq_AL’, ‘sv_SE’, ‘ta_IN’, ‘th’, ‘th_TH’, ‘tl_PH’, ‘tr_TR’, ‘tw_GH’, ‘uk_UA’, ‘vi_VN’, ‘zh_CN’, ‘zh_TW’]
- field dob_column: Union[None, str, ColId] = None¶
Date of birth column. Used for LT and ET locales only.
- field start_dob: str = '1900-01-01'¶
Start date of birth for sampling when no DoB column has been provided. Used for LT and ET locales only. Must be in ‘%Y-%m-%d’ format.
- field end_dob: str = '2024-01-01'¶
End date of birth for sampling when no DoB column has been provided. Used for LT and ET locales only. Must be in ‘%Y-%m-%d’ format.
- class hazy_configurator.settings.id_settings.normal_id_settings.SWIFTSettings¶
Bases:
UniqueIdSettings
Banking SWIFT code, representing a bank branch.
- Fields:
- field primary: bool = False¶
When set to True the 11-digit SWIFT codes generated will always end in XXX to denote that they belong to primary branches/offices.
- field infer_from_data: bool = True¶
When set to True the type of swift codes and locales will be detected from the data and modelled. This parameter cannot be True when both swift_type and locales are set.If this parameter is True but no valid swift codes are found in the target data, a default locale of GB and default swift types of [SwiftTypes.SWIFT8, SwiftTypes.SWIFT11] will be used.
- field swift_types: Optional[List[SwiftTypes]] = None¶
The types of SWIFT code to generate: [‘SWIFT8’, ‘SWIFT11’]
- field locales: Optional[List[SwiftLocales]] = None¶
The locales to sample data from, options are: [‘AD’, ‘AE’, ‘AG’, ‘AI’, ‘AL’, ‘AM’, ‘AO’, ‘AR’, ‘AS’, ‘AT’, ‘AU’, ‘AW’, ‘AZ’, ‘BA’, ‘BB’, ‘BD’, ‘BE’, ‘BF’, ‘BG’, ‘BH’, ‘BJ’, ‘BM’, ‘BN’, ‘BO’, ‘BQ’, ‘BR’, ‘BS’, ‘BT’, ‘BW’, ‘BY’, ‘BZ’, ‘CA’, ‘CH’, ‘CI’, ‘CK’, ‘CL’, ‘CM’, ‘CN’, ‘CO’, ‘CR’, ‘CV’, ‘CW’, ‘CY’, ‘CZ’, ‘DE’, ‘DJ’, ‘DK’, ‘DM’, ‘DO’, ‘DZ’, ‘EC’, ‘EE’, ‘EG’, ‘EH’, ‘ES’, ‘ET’, ‘FI’, ‘FJ’, ‘FK’, ‘FO’, ‘FR’, ‘GA’, ‘GB’, ‘GD’, ‘GE’, ‘GG’, ‘GH’, ‘GI’, ‘GL’, ‘GM’, ‘GN’, ‘GP’, ‘GQ’, ‘GR’, ‘GT’, ‘GU’, ‘GW’, ‘GY’, ‘HK’, ‘HN’, ‘HR’, ‘HT’, ‘HU’, ‘ID’, ‘IE’, ‘IL’, ‘IM’, ‘IN’, ‘IS’, ‘IT’, ‘JE’, ‘JM’, ‘JO’, ‘JP’, ‘KE’, ‘KG’, ‘KH’, ‘KI’, ‘KM’, ‘KN’, ‘KR’, ‘KW’, ‘KY’, ‘KZ’, ‘LA’, ‘LB’, ‘LC’, ‘LI’, ‘LK’, ‘LR’, ‘LS’, ‘LT’, ‘LU’, ‘LV’, ‘MA’, ‘MC’, ‘MD’, ‘ME’, ‘MG’, ‘MH’, ‘MK’, ‘ML’, ‘MM’, ‘MN’, ‘MO’, ‘MQ’, ‘MR’, ‘MS’, ‘MT’, ‘MU’, ‘MV’, ‘MW’, ‘MX’, ‘MY’, ‘MZ’, ‘NA’, ‘NC’, ‘NE’, ‘NG’, ‘NI’, ‘NL’, ‘NO’, ‘NP’, ‘NZ’, ‘OM’, ‘PA’, ‘PE’, ‘PF’, ‘PG’, ‘PH’, ‘PK’, ‘PL’, ‘PR’, ‘PS’, ‘PT’, ‘PW’, ‘PY’, ‘QA’, ‘RE’, ‘RO’, ‘RS’, ‘RU’, ‘RW’, ‘SA’, ‘SB’, ‘SC’, ‘SD’, ‘SE’, ‘SG’, ‘SH’, ‘SI’, ‘SK’, ‘SL’, ‘SM’, ‘SN’, ‘SO’, ‘SR’, ‘SS’, ‘ST’, ‘SV’, ‘SX’, ‘SY’, ‘TC’, ‘TG’, ‘TH’, ‘TJ’, ‘TK’, ‘TL’, ‘TM’, ‘TN’, ‘TO’, ‘TR’, ‘TT’, ‘TV’, ‘TW’, ‘TZ’, ‘UA’, ‘UG’, ‘UM’, ‘US’, ‘UY’, ‘UZ’, ‘VE’, ‘VG’, ‘VI’, ‘VN’, ‘VU’, ‘WF’, ‘WS’, ‘XK’, ‘YE’, ‘YT’, ‘ZA’, ‘ZM’, ‘ZW’]
- class hazy_configurator.settings.id_settings.normal_id_settings.SWIFT11Settings¶
Bases:
UniqueIdSettings
Banking SWIFT code with a length of 11 characters.
- Fields:
id_type (Literal['swift11'])
locales (Optional[List[hazy_configurator.base.enums.SwiftLocales]])
- field primary: bool = False¶
When set to True the 11-digit SWIFT codes generated will always end in XXX to denote that they belong to primary branches/offices.
- field infer_from_data: bool = True¶
When set to True the locales will be detected from the data and modelled. This parameter cannot be True when locales is set. If this parameter is True but no valid swift codes are found in the target data, a default locale of GB will be used.
- field unique: bool = True¶
When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.
- field locales: Optional[List[SwiftLocales]] = None¶
The locales to sample data from, options are: [‘AD’, ‘AE’, ‘AG’, ‘AI’, ‘AL’, ‘AM’, ‘AO’, ‘AR’, ‘AS’, ‘AT’, ‘AU’, ‘AW’, ‘AZ’, ‘BA’, ‘BB’, ‘BD’, ‘BE’, ‘BF’, ‘BG’, ‘BH’, ‘BJ’, ‘BM’, ‘BN’, ‘BO’, ‘BQ’, ‘BR’, ‘BS’, ‘BT’, ‘BW’, ‘BY’, ‘BZ’, ‘CA’, ‘CH’, ‘CI’, ‘CK’, ‘CL’, ‘CM’, ‘CN’, ‘CO’, ‘CR’, ‘CV’, ‘CW’, ‘CY’, ‘CZ’, ‘DE’, ‘DJ’, ‘DK’, ‘DM’, ‘DO’, ‘DZ’, ‘EC’, ‘EE’, ‘EG’, ‘EH’, ‘ES’, ‘ET’, ‘FI’, ‘FJ’, ‘FK’, ‘FO’, ‘FR’, ‘GA’, ‘GB’, ‘GD’, ‘GE’, ‘GG’, ‘GH’, ‘GI’, ‘GL’, ‘GM’, ‘GN’, ‘GP’, ‘GQ’, ‘GR’, ‘GT’, ‘GU’, ‘GW’, ‘GY’, ‘HK’, ‘HN’, ‘HR’, ‘HT’, ‘HU’, ‘ID’, ‘IE’, ‘IL’, ‘IM’, ‘IN’, ‘IS’, ‘IT’, ‘JE’, ‘JM’, ‘JO’, ‘JP’, ‘KE’, ‘KG’, ‘KH’, ‘KI’, ‘KM’, ‘KN’, ‘KR’, ‘KW’, ‘KY’, ‘KZ’, ‘LA’, ‘LB’, ‘LC’, ‘LI’, ‘LK’, ‘LR’, ‘LS’, ‘LT’, ‘LU’, ‘LV’, ‘MA’, ‘MC’, ‘MD’, ‘ME’, ‘MG’, ‘MH’, ‘MK’, ‘ML’, ‘MM’, ‘MN’, ‘MO’, ‘MQ’, ‘MR’, ‘MS’, ‘MT’, ‘MU’, ‘MV’, ‘MW’, ‘MX’, ‘MY’, ‘MZ’, ‘NA’, ‘NC’, ‘NE’, ‘NG’, ‘NI’, ‘NL’, ‘NO’, ‘NP’, ‘NZ’, ‘OM’, ‘PA’, ‘PE’, ‘PF’, ‘PG’, ‘PH’, ‘PK’, ‘PL’, ‘PR’, ‘PS’, ‘PT’, ‘PW’, ‘PY’, ‘QA’, ‘RE’, ‘RO’, ‘RS’, ‘RU’, ‘RW’, ‘SA’, ‘SB’, ‘SC’, ‘SD’, ‘SE’, ‘SG’, ‘SH’, ‘SI’, ‘SK’, ‘SL’, ‘SM’, ‘SN’, ‘SO’, ‘SR’, ‘SS’, ‘ST’, ‘SV’, ‘SX’, ‘SY’, ‘TC’, ‘TG’, ‘TH’, ‘TJ’, ‘TK’, ‘TL’, ‘TM’, ‘TN’, ‘TO’, ‘TR’, ‘TT’, ‘TV’, ‘TW’, ‘TZ’, ‘UA’, ‘UG’, ‘UM’, ‘US’, ‘UY’, ‘UZ’, ‘VE’, ‘VG’, ‘VI’, ‘VN’, ‘VU’, ‘WF’, ‘WS’, ‘XK’, ‘YE’, ‘YT’, ‘ZA’, ‘ZM’, ‘ZW’]
- class hazy_configurator.settings.id_settings.normal_id_settings.SWIFT8Settings¶
Bases:
UniqueIdSettings
Banking SWIFT code with a length of 8 characters.
- Fields:
id_type (Literal['swift8'])
locales (Optional[List[hazy_configurator.base.enums.SwiftLocales]])
- field unique: bool = True¶
When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.
- field infer_from_data: bool = True¶
When set to True the locales will be detected from the data and modelled. This parameter cannot be True when locales is set.If this parameter is True but no valid swift codes are found in the target data, a default locale of GB will be used.
- field locales: Optional[List[SwiftLocales]] = None¶
The locales to sample data from, options are: [‘AD’, ‘AE’, ‘AG’, ‘AI’, ‘AL’, ‘AM’, ‘AO’, ‘AR’, ‘AS’, ‘AT’, ‘AU’, ‘AW’, ‘AZ’, ‘BA’, ‘BB’, ‘BD’, ‘BE’, ‘BF’, ‘BG’, ‘BH’, ‘BJ’, ‘BM’, ‘BN’, ‘BO’, ‘BQ’, ‘BR’, ‘BS’, ‘BT’, ‘BW’, ‘BY’, ‘BZ’, ‘CA’, ‘CH’, ‘CI’, ‘CK’, ‘CL’, ‘CM’, ‘CN’, ‘CO’, ‘CR’, ‘CV’, ‘CW’, ‘CY’, ‘CZ’, ‘DE’, ‘DJ’, ‘DK’, ‘DM’, ‘DO’, ‘DZ’, ‘EC’, ‘EE’, ‘EG’, ‘EH’, ‘ES’, ‘ET’, ‘FI’, ‘FJ’, ‘FK’, ‘FO’, ‘FR’, ‘GA’, ‘GB’, ‘GD’, ‘GE’, ‘GG’, ‘GH’, ‘GI’, ‘GL’, ‘GM’, ‘GN’, ‘GP’, ‘GQ’, ‘GR’, ‘GT’, ‘GU’, ‘GW’, ‘GY’, ‘HK’, ‘HN’, ‘HR’, ‘HT’, ‘HU’, ‘ID’, ‘IE’, ‘IL’, ‘IM’, ‘IN’, ‘IS’, ‘IT’, ‘JE’, ‘JM’, ‘JO’, ‘JP’, ‘KE’, ‘KG’, ‘KH’, ‘KI’, ‘KM’, ‘KN’, ‘KR’, ‘KW’, ‘KY’, ‘KZ’, ‘LA’, ‘LB’, ‘LC’, ‘LI’, ‘LK’, ‘LR’, ‘LS’, ‘LT’, ‘LU’, ‘LV’, ‘MA’, ‘MC’, ‘MD’, ‘ME’, ‘MG’, ‘MH’, ‘MK’, ‘ML’, ‘MM’, ‘MN’, ‘MO’, ‘MQ’, ‘MR’, ‘MS’, ‘MT’, ‘MU’, ‘MV’, ‘MW’, ‘MX’, ‘MY’, ‘MZ’, ‘NA’, ‘NC’, ‘NE’, ‘NG’, ‘NI’, ‘NL’, ‘NO’, ‘NP’, ‘NZ’, ‘OM’, ‘PA’, ‘PE’, ‘PF’, ‘PG’, ‘PH’, ‘PK’, ‘PL’, ‘PR’, ‘PS’, ‘PT’, ‘PW’, ‘PY’, ‘QA’, ‘RE’, ‘RO’, ‘RS’, ‘RU’, ‘RW’, ‘SA’, ‘SB’, ‘SC’, ‘SD’, ‘SE’, ‘SG’, ‘SH’, ‘SI’, ‘SK’, ‘SL’, ‘SM’, ‘SN’, ‘SO’, ‘SR’, ‘SS’, ‘ST’, ‘SV’, ‘SX’, ‘SY’, ‘TC’, ‘TG’, ‘TH’, ‘TJ’, ‘TK’, ‘TL’, ‘TM’, ‘TN’, ‘TO’, ‘TR’, ‘TT’, ‘TV’, ‘TW’, ‘TZ’, ‘UA’, ‘UG’, ‘UM’, ‘US’, ‘UY’, ‘UZ’, ‘VE’, ‘VG’, ‘VI’, ‘VN’, ‘VU’, ‘WF’, ‘WS’, ‘XK’, ‘YE’, ‘YT’, ‘ZA’, ‘ZM’, ‘ZW’]
- class hazy_configurator.settings.id_settings.normal_id_settings.MD5Settings¶
Bases:
UniqueIdSettings
MD5 hash.
- Fields:
id_type (Literal['md5'])
- class hazy_configurator.settings.id_settings.normal_id_settings.PhoneNumberSettings¶
Bases:
UniqueIdSettings
Phone number, formatted by locale.
- Fields:
id_type (Literal['phone_number'])
- field unique: bool = True¶
When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.
- field locales: List[str] [Required]¶
The locales to sample data from, options are: [‘ar_AA’, ‘ar_AE’, ‘ar_BH’, ‘ar_EG’, ‘ar_JO’, ‘ar_PS’, ‘ar_SA’, ‘az_AZ’, ‘bg_BG’, ‘bn_BD’, ‘bs_BA’, ‘cs_CZ’, ‘da_DK’, ‘de’, ‘de_AT’, ‘de_CH’, ‘de_DE’, ‘dk_DK’, ‘el_CY’, ‘el_GR’, ‘en’, ‘en_AU’, ‘en_CA’, ‘en_GB’, ‘en_IE’, ‘en_IN’, ‘en_NZ’, ‘en_TH’, ‘en_US’, ‘es’, ‘es_CA’, ‘es_CL’, ‘es_CO’, ‘es_ES’, ‘es_MX’, ‘et_EE’, ‘fa_IR’, ‘fi_FI’, ‘fr_CA’, ‘fr_CH’, ‘fr_FR’, ‘fr_QC’, ‘ga_IE’, ‘he_IL’, ‘hi_IN’, ‘hr_HR’, ‘hu_HU’, ‘hy_AM’, ‘id_ID’, ‘it_CH’, ‘it_IT’, ‘ja_JP’, ‘ka_GE’, ‘ko_KR’, ‘la’, ‘lb_LU’, ‘lt_LT’, ‘lv_LV’, ‘mt_MT’, ‘ne_NP’, ‘nl_BE’, ‘nl_NL’, ‘no_NO’, ‘or_IN’, ‘pl_PL’, ‘pt_BR’, ‘pt_PT’, ‘ro_RO’, ‘ru_RU’, ‘sk_SK’, ‘sl_SI’, ‘sq_AL’, ‘sv_SE’, ‘ta_IN’, ‘th’, ‘th_TH’, ‘tr_TR’, ‘tw_GH’, ‘uk_UA’, ‘vi_VN’, ‘zh_CN’, ‘zh_TW’]
- class hazy_configurator.settings.id_settings.normal_id_settings.ColumnSettings¶
Bases:
IdSettings
Samples values from another specified column.
- Fields:
column (Union[hazy_configurator.base.col_identifier.ColId, str])
id_type (Literal['column'])
- class hazy_configurator.settings.id_settings.normal_id_settings.OptionSettings¶
Bases:
IdSettings
Sample from a list of provided values. It is usually used within a composite pattern.
- Fields:
id_type (Literal['option'])
options (List[Union[str, datetime.datetime, datetime.timedelta, bool, int, float]])
- class hazy_configurator.settings.id_settings.normal_id_settings.NameSettings¶
Bases:
UniqueIdSettings
Names sampled from locales.
- Fields:
- field unique: bool = True¶
When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.
- field name_type: NameIDTypes = NameIDTypes.FIRST_NAME¶
Type of name to be sampled. The options are [‘full_name’, ‘full_name_female’, ‘full_name_male’, ‘first_name’, ‘first_name_female’, ‘first_name_male’, ‘last_name’]
- field locales: List[PersonLocales] = [<PersonLocales.en_GB: 'en_GB'>]¶
The locales to sample data from, options are: [‘ar_AA’, ‘ar_PS’, ‘ar_SA’, ‘az_AZ’, ‘bg_BG’, ‘bn_BD’, ‘cs_CZ’, ‘da_DK’, ‘de_AT’, ‘de_CH’, ‘de_DE’, ‘el_GR’, ‘en_GB’, ‘en_IE’, ‘en_NZ’, ‘en_US’, ‘es_CA’, ‘es_CL’, ‘es_CO’, ‘es_ES’, ‘et_EE’, ‘fa_IR’, ‘fi_FI’, ‘fr_CA’, ‘fr_CH’, ‘fr_FR’, ‘fr_QC’, ‘ga_IE’, ‘he_IL’, ‘hr_HR’, ‘hu_HU’, ‘hy_AM’, ‘id_ID’, ‘it_IT’, ‘ja_JP’, ‘ka_GE’, ‘ko_KR’, ‘ne_NP’, ‘nl_NL’, ‘no_NO’, ‘or_IN’, ‘pl_PL’, ‘pt_BR’, ‘pt_PT’, ‘ro_RO’, ‘ru_RU’, ‘sl_SI’, ‘sv_SE’, ‘ta_IN’, ‘th_TH’, ‘tr_TR’, ‘tw_GH’, ‘uk_UA’, ‘zh_CN’, ‘zh_TW’]
- class hazy_configurator.settings.id_settings.normal_id_settings.SplitSettings¶
Bases:
IdSettings
Split a column into separate components that can either be modelled as categoricals or replaced with a regex pattern.
Example
from hazy_configurator import SplitSettings SplitSettings( split_map={ 6: None, 12: “[0-9]{6}“, 18: None, } )
{ "id_type": "split", "split_map": { 6: null, 12: “[0-9]{6}“, 18: null, }, }
The example above will treat the target column in the following way. Characters 1-6 and 13-18 will be modelled as categorical values. Characters 7-12 will be replaced with a selection of characters that match the corresponding regex pattern.
- Fields:
id_type (Literal['split'])
- field unique: bool = False¶
When set to True the generated values will be unique. This is not guaranteed as the number of records generated could be greater than the possible amount of unique values.
- field weighted_sampling: bool = False¶
Determines the sampling strategy when encountering alternations in provided regex patterns. If set to False (default), values will be randomly sampled from all branches without considering their likelihood due to cardinality. This leads to nearly equal sampling across each branch. If switched to True, the function will sample from each branch based on their cardinality, ensuring that choices with higher cardinality contribute more samples.
Real ID¶
This can be used by the IdHandlerSettingsUnion
type only used in ID Handler and
ID Mapper Handler.
Compound ID¶
This can be used by the IdSettingsUnion
, IdHandlerSettingsUnion
and TextCategorySettingsUnion
types. This can be used by
ID Type, ID Handler, ID Mapper Handler and Text Category Handler.
- class hazy_configurator.settings.id_settings.compound_settings.CompoundSettings¶
Bases:
UniqueIdSettings
Generate complex ID patterns by providing a pattern alongside a dictionary of sampling behaviours for each component within the pattern.
Examples
from hazy_configurator import CompoundSettings, NumericalIdSettings, NameSettings settings = CompoundSettings( pattern="{IDTYPE1}/{IDTYPE2}", id_settings={ "IDTYPE1": NameSettings(name_type=NameTypes.FIRST_NAME_MALE), "IDTYPE2": NumericalIdSettings(length=4), }, unique=True, )
{ "id_type": "compound", "pattern": "{IDTYPE1}/{IDTYPE2}", "id_settings": { "IDTYPE1": { "id_type": "name", "name_type": "first_name_male" }, "IDTYPE2": { "id_type": "numerical", "length": 4 } }, "unique": true }
- Fields:
id_type (Literal['compound'])
max_iter (int)
- field pattern: str [Required]¶
The pattern the generated combination of IDs will follow, i.e. “{IDTYPE1}/{IDTYPE2}” will generate two IDs separated by a forward slash.
- field id_settings: Dict[str, NormalIdSettingsUnion] [Required]¶
Dictionary containing the mapping from pattern to samplers and the sampler settings themselves. Each key should correspond to a sampler identifier used in the pattern, with each value being a
NormalIdSettingsUnion
which can be any of the options in Standard IDs.
Mixture ID¶
This can be used by the IdSettingsUnion
and IdHandlerSettingsUnion
types. This can be used by
ID Type, ID Handler and ID Mapper Handler.
- class hazy_configurator.settings.id_settings.id_mixture_settings.IdMixturePatternConfig¶
Bases:
HazyBaseModel
Used to provide a single regular expression and the corresponding ID sampler settings to the IdMixtureSettings configuration.
- Fields:
- field label: Optional[str] = None¶
[DEPRECATED] Categorical label for matches to be transformed into.
- field sampler: Union[NormalIdSettingsUnion, CompoundSettings] [Required]¶
- class hazy_configurator.settings.id_settings.id_mixture_settings.IdMixtureSettings¶
Bases:
UniqueIdSettings
Used to model multiple ID formats in a single column.
The examples below show how this feature can be used to model two different formats of IDs, one beginning with ‘A’ and one with ‘B’, both followed by 3 digits.
Examples
from hazy_configurator import ( IdMixtureSettings, IdMixturePatternConfig, RegexIdSettings, ) settings = IdMixtureSettings( patterns=[ IdMixturePatternConfig( match="A[0-9]{3}", sampler=RegexIdSettings(pattern="A[0-9]{3}") ), IdMixturePatternConfig( match="B[0-9]{3}", sampler=RegexIdSettings(pattern="B[0-9]{3}") ) ] )
{ "id_type": "id_mixture", "patterns": [ { "match": "A[0-9]{3}", "sampler": {"id_type": "regex", "pattern": "A[0-9]{3}"} }, { "match": "B[0-9]{3}", "sampler": {"id_type": "regex", "pattern": "B[0-9]{3}"} } ] }
- Fields:
id_type (Literal['id_mixture'])
max_iter (int)
patterns (List[hazy_configurator.settings.id_settings.id_mixture_settings.IdMixturePatternConfig])
- field patterns: List[IdMixturePatternConfig] [Required]¶
Parameters for matching text and sampling.
- field mismatch: IdMismatchBehaviour = IdMismatchBehaviour.REPLACE¶
Behaviour when there are values that do not match any of the specified conditions. ‘replace’ will replaced unmatched values with other conditions. ‘preserve’ will leave any unmatched values as they are and treat them as categories.
Conditioned ID¶
This can be used by the IdSettingsUnion
and IdHandlerSettingsUnion
types. This can be used by
ID Type, ID Handler and ID Mapper Handler.
- class hazy_configurator.settings.id_settings.conditioned_settings.ConditionedIdCondition¶
Bases:
HazyBaseModel
- Fields:
dependencies (List[Union[hazy_configurator.base.col_identifier.ColId, str]])
query (str)
sampler (Union[NormalIdSettingsUnion, hazy_configurator.settings.id_settings.compound_settings.CompoundSettings, hazy_configurator.settings.id_settings.id_mixture_settings.IdMixtureSettings])
- class hazy_configurator.settings.id_settings.conditioned_settings.ConditionedIdSettings¶
Bases:
UniqueIdSettings
Generate complex ID patterns by providing a list of queries alongisde behaviours for each component within the pattern.
Examples
from hazy_configurator import ( ConditionedIdSettings, ConditionedIdCondition, NumericalIdSettings, ColId ) ConditionedIdSettings( mismatch='replace', unique=True, conditions=[ ConditionedIdCondition( query="`('table1', 'col1')` == 'A'", dependencies=[ColId(col="col1", table="table1")], sampler=NumericalIdSettings(length=5) ), ConditionedIdCondition( query="`('table1', 'col1')` == 'B'", dependencies=[ColId(col="col1", table="table1")], sampler=NumericalIdSettings(length=6) ) ] )
{ "id_type": "conditioned", "conditions": [ { "query": "`('table1', 'col1')` == 'A'", "dependencies": [{"col": "col1", "table": "table1"}], "sampler": {"id_type": "numerical", "length": 5} }, { "query": "`('table1', 'col1')` == 'B'", "dependencies": [{"col": "col1", "table": "table1"}], "sampler": {"id_type": "numerical", "length": 6} } ], "unique": true, "mismatch": "replace" }
- Fields:
- field conditions: List[ConditionedIdCondition] [Required]¶
List of `ConditionedIdCondition`s, used to specify the type of ID to be generated when a particular query matches.
- field mismatch: IdMismatchBehaviour = IdMismatchBehaviour.REPLACE¶
Behaviour when there are values that do not match any of the specified conditions. ‘replace’ will replace unmatched values with other conditions. ‘preserve’ will leave any unmatched values as they are and treat them as categories.