Entities¶
Entities are used to define entity level parameters that are shared across all columns belonging to the entity.
Combination Entity¶
- class hazy_configurator.data_schema.combination_entity.CombinationEntity¶
Bases:
BaseEntity
A combination entity is used to define entity level parameters that are shared across all columns belonging to the entity.
Location Entity¶
- class hazy_configurator.data_schema.location_entity.LocationEntity¶
Bases:
BaseEntity
A location entity is used to define entity level parameters that are shared across all columns belonging to the entity.
- Fields:
- field locales: List[GeoLocales] = [<GeoLocales.en_GB: 'en_GB'>, <GeoLocales.en_US: 'en_US'>]¶
Locales used for generating location components.
- field mismatch: LocationTypeMismatch = LocationTypeMismatch.RANDOM¶
When synthesizing data, the algorithm reproduces the geographic distribution of the source data. In order to learn the distribution it has to group records in the source data into the predetermined clusters. Some records will not match a cluster, either to being a new postcode, or because they were mistyped and this setting decides how to handle those mismatched addresses. The options are: “drop” - i.e. ignore this address, “approximate” i.e. find the closest matching address in the public database, “random” i.e. pick a random cluster.
- field num_clusters: int = 500¶
When synthesizing data, the algorithm reproduces the geographic distribution of the source data. It does this by grouping addresses in the source data into clusters and learning the distribution of addresses between the different clusters. The synthesized records reproduce the distribution of addresses between the clusters. When assigning an address to a synthesized record, the address is assigned randomly within the cluster from the publicly available addresses within that cluster. This setting sets the number of clusters to group the addresses within that locale into. Note: the clustering algorithm is trained on public data and not on the data provided to the the pipeline
- Constraints:
exclusiveMinimum = 0
- field territory_modelling: LocationTerritoryModellingType = LocationTerritoryModellingType.ASSET_SAMPLING¶
How lower specificity locations than post/zip code ie country, state, district, city are modelled. ‘combination’ means sample from combinations of the source country/state/district provided. This will mean source distributions are preserved. And allows locations outside of Hazy’s known locales. ‘asset_sampling’ means sample from Hazy location assets using the provided locales.
Person Entity¶
- class hazy_configurator.data_schema.person_entity.PersonEntity¶
Bases:
BaseEntity
A person entity is used to define entity level parameters that are shared across all columns belonging to the entity.
- Fields:
- field locales: List[PersonLocales] = [<PersonLocales.en_GB: 'en_GB'>, <PersonLocales.en_US: 'en_US'>]¶
Locales used for generating person components.