Data Schema¶
The schema defines the data structure and column features.
Data Schema¶
- class hazy_configurator.data_schema.data_schema.DataSchema¶
Bases:
HazyBaseModel
Describes the data schema.
Tables are defined through the
tables
field. Any special rules which cannot be defined by the table data types should be defined usingcustom_handlers
.- Fields:
- field custom_handlers: List[CustomHandlerConfig] = []¶
Extra processing which can be carried out. This should only be required for advanced features. For most use cases defining data types is enough. See Custom Handlers. If inputting custom handlers in the UI, please use their JSON formats e.g.
- [
- {
“type”: “age”, “age_column”: “Age”, “dob_column”: “DoB”, “ref_date”: “2022-12-25”
}
]
- field tables: List[HazyTable] [Required]¶
List of tables which make up the data schema. See Data Table.
- field automatic_handlers: Optional[AutomaticHandlerExtractorConfig] = None¶
Types of rules to check for in the data which will then be enforced in the generated data.
- field entities: List[Union[PersonEntity, LocationEntity, CombinationEntity]] = []¶
Entity level settings.
- property multi_config: MultiTableConfig¶
Builds structured object of links between tables and runs validation to check for configuration errors.
- Returns:
Representation of structural links between tables.
- Return type:
MultiTableConfig
- get_table(name: str) Optional[HazyTable] ¶
Get a specific table from the schema. Search by table name.
- Parameters:
name (TableName) – Table name to search for.
- Returns:
Table from the data schema. Returns None if not found.
- Return type:
Optional[HazyTable]
- get_dtype(table: str, col: str) Optional[HazyDataTypeUnion] ¶
Get the dtype config from the schema. Search by table and column.
- Parameters:
table (TableName) – Name of the table.
col (ColumnName) – Name of the Column.
- Returns:
If column/table not found, None will be returned.
- Return type:
Optional[HazyDataTypeUnion]
- normalise() Tuple[DataSchema, List[Union[DenormalItem, SingleColumnNormaliserConfig]]] ¶
If normalisations have been defined in the configuration, this function returns a new DataSchema object with the tables normalised and a list of the normalisation configs.
Automatic Handlers¶
- class hazy_configurator.data_schema.automatic_handlers.ConditionedExtractorConfig¶
Bases:
BaseExtractorConfig
Finds cases where only certain values in one column are allowed as a result of values in other columns.
Either
min_n
ormin_prop
must be provided.- Fields:
- field min_n: Optional[int] = 10¶
Minimum number of examples to determine something as conditioned.
- Constraints:
exclusiveMinimum = 0
- field min_prop: Optional[float] = None¶
Minimum proportion of examples in a column to determine something as conditioned.
- Constraints:
minimum = 0.0
maximum = 1.0
- class hazy_configurator.data_schema.automatic_handlers.DeterminedExtractorConfig¶
Bases:
BaseExtractorConfig
Finds cases where one column is entirely determined by another column.
On finding a column is determined by another it means that column does not need to be statistically modelled and be generated entirely from another column.
- field max_error_proportion: float = 0.0¶
The maximum proportion of values which do not have to conform to the determined rule. Under this threshold we still consider that column to be determined. The default of 0.0 means the target column must be entirely determined.
- Constraints:
minimum = 0.0
maximum = 1.0
- class hazy_configurator.data_schema.automatic_handlers.AutomaticHandlerExtractorConfig¶
Bases:
HazyBaseModel
- Fields:
- field extractors: List[Union[ConditionedExtractorConfig, DeterminedExtractorConfig]] [Required]¶
Types of rules to check for in the data which will then be enforced in the generated data.
- field ignore: Dict[str, List[str]] = {}¶
Mapping from table names to lists of columns to ignore for rule extraction.