Python Client

The Python client encapsulates the Hub API and use of Hazy Synthesisers behind Python classes. The minimum version of Python recommended for use of the client library is 3.6.10.

Version: 0.3.5

hazy_client.configurator module

class hazy_client.configurator.HazyConfigurator

Bases: object

check_dtypes(table, df)

Check dtypes by trying to cast each column of the table to the configured dtype

composite_keys()
classmethod create_template(input_data_path, output_config_file)
custom_handlers()

Returns handlers grouped by table and type

dtypes()

Returns dict of dicts that can be passed as the dtypes param to train() on the synth

evaluation_exclude_columns()

Returns list that can be passed as the evaluation_exclude_columns param to train() on the synth

foreign_keys()
load(table, config_file)

Expects tab-delimited files

multi_table_config()
primary_keys()
single_table_handlers(table)
hazy_client.configurator.age_handler(data)
hazy_client.configurator.date_handler(data)
hazy_client.configurator.id_handler(data, target_key='target')
hazy_client.configurator.interleave(a, b)
hazy_client.configurator.location_handler(data)
hazy_client.configurator.parse_formula(formula)
hazy_client.configurator.person_handler(data)
hazy_client.configurator.rule_handler(target, rule)

hazy_client.hub module

class hazy_client.hub.HazyGenerator(gen_uri, host, api_key)

Bases: object

Mediates interaction with generators on the hub.

Parameters
  • gen_uri (str) – Uniform Resource Identifier for generator. This is the name of the organisation and the name of the generator separated by a forward slash

  • host (str) – URL of Hazy hub host

  • api_key (str) – Personal API key (can be found on the dashboard page after logging into the hub)

host

API host address

Type

str

gen_uri

Uniform Resource Identifier for generator. This is the name of the organisation and the name of the generator separated by a forward slash.

Type

str

auth

API authentication details

Type

dict

latest_model(work_dir, query=None, model_file=None)

Downloads the latest model that meets the query criteria.

Parameters
  • work_dir (pathlib.PosixPath) – Path to working directory

  • model_file (str) – Name of model file to be stored locally. If none is given then model will be downloaded with ‘model_{timestamp}.hmf’ as it’s name

  • query (dict) – Dict containing optional keys (‘epsilon’,’similarity’,’utility’) and value is a number, None or dict with required keys ‘comparison’ and ‘value’, where ‘value’ is a number and ‘comparison’ is one of =,<,>,<=,>=.

Returns

Hazy model object

Return type

hazy_client.synth.HazyModel

model(work_dir, model_id, model_file=None)

Downloads a model using it’s model ID.

Parameters
  • work_dir (pathlib.PosixPath) – Path to working directory

  • model_id (str) – Model ID as shown when listing the versions or models of a generator

  • model_file (str) – Name of model file to be stored locally. If none is given then model will be downloaded with ‘model_{timestamp}.hmf’ as it’s name

Returns

Hazy model object

Return type

hazy_client.synth.HazyModel

query_models(version=None, query={})

Returns meta data on a model that matches the query in a given generator version. If no generator version is provided then the latest generator version will be searched. If no query is provided then meta data on the best performing model (based on similarity and utility) will be returned.

Parameters
  • version (str) – Generator version

  • query (dict) – Dictionary containing parameter and value (i.e. {‘epsilon’: 0.01})

Returns

Dictionary containing meta data on the model that matches the query

Return type

dict

versions()

Shows the available generator versions and their corresponding models.

Returns

DataFrame containing all verisons of a generator and the models within them

Return type

DataFrame

class hazy_client.hub.HazyHub(host, api_key)

Bases: object

Encapsulates Hazy Hub API.

Parameters
  • host (str) – URL of Hazy hub host

  • api_key (str) – Personal API key (can be found on the dashboard page after logging into the hub)

host

URL of Hazy hub host

Type

str

auth

API authentication details

Type

dict

create_version(gen_uri, model_list=[], activate=True)

Creates new generator version. If model list is provided, models will be uploaded to this new generator.

Parameters
  • gen_uri (str) – Uniform Resource Identifier for generator. This is the name of the organisation and the name of the generator separated by a forward slash

  • model_list (list) – List of model objects. Defaults to None

  • activate (bool) – Boolean value that determines whether or not the new generator is activated on the hub. Defaults to True

Returns

Name of the new generator version created

Return type

str

download_image(image_name, docker_url=None)

Downloads a synthesiser from the Hazy hub.

Parameters

image_name (str) – Name of synthesiser as shown on the Hazy hub (i.e. project/tabular:TAG)

Returns

Full name of synthesiser Docker image (including the repository prefix)

Return type

str

generator(gen_uri)

Instantiates the HazyGenerator class and returns a generator object.

Parameters

gen_uri (str) – Uniform Resource Identifier for generator. This is the name of the organisation and the name of the generator separated by a forward slash.

Returns

Hazy generator object

Return type

hazy_client.hub.HazyGenerator

generators()

Shows the generators available to the user.

Returns

Dictionary of all generators available to the user

Return type

dict

identity()
synthesiser(image_name, work_dir, docker_url=None)

Instantiates the HazySynthesiser class and returns a synthesiser object.

Parameters
  • image_name (str) – Name of synthesiser as shown on the Hazy hub (i.e. project/tabular:TAG)

  • work_dir (str) – Working directory for exchanging data with the synthesiser

  • docker_url (optional str) – the url of the socket exposed by the docker daemon or podman API service

Returns

Hazy synthesiser object

Return type

hazy_client.synth.HazySynthesiser

synthesisers()
upload_models(gen_uri, version, model_list=[], activate=True)

Uploads models to an existing generator version

Parameters
  • model_list (list) – List of model objects

  • gen_uri (str) – Uniform Resource Identifier for generator. This is the name of the organisation and the name of the generator separated by a forward slash

  • version (str) – Name of generator version that models will be uploaded to

  • activate (bool) – Boolean value that determines whether or not the generator version is activated on the hub. Defaults to True

Returns

requests.models.Response Requests response object

hazy_client.hub.log_api_error(name, description, response)

Logs API errors.

Parameters
  • name (str) – Name of function

  • response (requests.models.Response) – Requests response object

hazy_client.hub.requests_get(url, headers, stream=False)
hazy_client.hub.requests_post(url, headers, files=None, data={})

hazy_client.synth module

class hazy_client.synth.HazyModel(model_file, work_dir)

Bases: object

Encapsulates Hazy generator model file archives.

Parameters
  • model_file (str) – Name of Hazy model file

  • work_dir (pathlib.PosixPath) – Path to working directory

file

Path to model file

Type

pathlib.PosixPath

work_dir

Path to working directory

Type

pathlib.PosixPath

meta

Dictionary containing all metadata for model

Type

dict

custom_metadata()

Returns the custom metadata for this model

Returns

Dictionary containing custom metadata

Return type

dict

dtypes()

Returns the feature data types of the source dataset.

Returns

Dictionary with the features as keys and the data types as values

Return type

dict

generate(output=None, return_df=False, **kwargs)

Generates synthetic data from a Hazy model and saves as a csv file in the working directory.

Parameters
  • output (str) – Name of csv file to be saved. If left to default then the model will be saved as ‘synth_{timestamp}.hmf’

  • return_df (bool) – Boolean value that determines whether or not a DataFrame is returned. Defaults to True

Keyword Arguments
  • num_rows (int) – Number of rows to be generated. Valid for Tabular synthesiser

  • num_accounts (int) – Number of accounts to be generated. Valid for Banking Transactions synthesiser

  • num_records (int) – Number of records to be generated. Valid for Stacked Tabular synthesiser

Returns

Returns path to csv file when return_df is False and returns a DataFrame when return_df is True

Return type

str or DataFrame

image()

Returns the name of the Docker image used to train the model.

Returns

Name of synthesiser’s Docker image

Return type

str

read_csv(csv)

Reads the csv output in the generation stage with the data types of the source data.

Parameters

csv (str) – Path to synthetic data csv file

Returns

DataFrame of synthetic data

Return type

DataFrame

sample_data()

Returns sample data that was generated and stored in the model file during the training stage.

Returns

DataFrame of sample data

Return type

DataFrame

score(score_type, score_name, table_name=None)

Returns a specific score.

Parameters
  • score_type (str) – Type of score (i.e. similarity, utility, privacy)

  • score_name (str) – Name of specific score

  • table_name (str) – Name of table (for multi table models only). Defaults to None.

Returns

Score within the range 0 - 100. A higher score indicates better performance

Return type

float

scores()

Returns all model scores.

Returns

Dictonary containing all of the model’s metrics

Return type

dict

class hazy_client.synth.HazySynthesiser(image, work_dir, docker_url=None)

Bases: object

Mediates interactions with a hazy docker container for generating synthetic data and training generator models.

Parameters
  • image (str) – Full name of synthesiser Docker image

  • work_dir (pathlib.PosixPath) – Path to working directory

image

Docker image

Type

docker.models.images.Image

work_dir

Path to working directory

Type

pathlib.PosixPath

generate(output=None, return_df=False, **kwargs)

Generates synthetic data from a Hazy model and saves as a csv file in the working directory.

Parameters
  • output (str) – Name of csv file to be saved. If left to default then the model will be saved as ‘synth_{timestamp}.hmf’

  • return_df (bool) – Boolean value that determines whether or not a DataFrame is returned. DataFrame is returned when True, path to csv file is returned when False. Must be set to False for multi-table generation

Keyword Arguments
  • num_accounts (int) – Number of accounts to be generated. Valid for Banking Transactions synthesiser

  • num_records (int) – Number of records to be generated. Valid for Tabular+, Stacked Tabular synthesiser,

Returns

Returns path to csv file when return_df is False and returns a DataFrame when return_df is True

Return type

str or DataFrame

help(endpoint)

Help function that shows what parameters are required during the training or generation stage for the synthesiser.

Parameters

endpoint (str) – Either “train” or “generate”

Returns

String showing showing the name and description of the parameters required for the synthesiser

Return type

str

remove_containers()

Removes all Hazy synthesiser containers. Conducts a force remove so any running containers will be stopped and removed.

train(evaluate=False, train_test_split=False, label_columns=[], predictors=[], sample_generate_params=None, evaluation_generate_params=None, evaluation_exclude_columns=[], model_name=None, description=None, **kwargs)

Trains a Hazy model on the input data and saves the model to the working directory.

Parameters
  • evaluate (bool) – Boolean value that determines whether or not evaluation is carried out. Defaults to True

  • train_test_split (bool) – Boolean value that determines whether or not train test split is used for evaluation. Defaults to True

  • label_columns (list) – List of columns that predictors will act on to assess utility performance. Defaults to []

  • predictors (list) – List of predictors to be used on label columns. Defaults to []

  • sample_generate_params (dict) – Parameters used for the sample data that is generated during training

  • evaluation_generate_params (dict) – Parameters used for the evaluation process

  • model_name (str) – Name of model file to be saved. If left to default then the model will be saved as ‘model_{timestamp}.hmf’

  • description (str) – Description of model. Defaults to None

Keyword Arguments
  • epsilon (float) – Privacy parameter. The smaller the value of epsilon, the higher the degree of privacy in the synthetic data - epsilon typically lies within the range 0.0001 < epsilon < 1000. Valid for Tabular and Stacked Tabular synthesizers

  • n_bins (int) – Number of bins that continuous features will be discretised into

  • n_parents (int) – Number of parents that each node will have in the network model

  • noise_level (float) – Privacy parameter. Valid for Banking Transactions synthesiser

  • source_data (DataFrame) – DataFrame of source data

Returns

Hazy model object

Return type

hazy_client.synth.HazyModel

train_docker_params(evaluate=False, train_test_split=False, label_columns=[], predictors=[], sample_generate_params=None, evaluation_generate_params=None, evaluation_exclude_columns=[], model_name=None, description=None, **kwargs)