Synthesiser

SynthDocker(image[, work_dir, docker_url, ...])

Synth driver for running training and generation by executing local docker containers.

SynthAPI(host[, api_key])

API client that can be used to interact with the Hazy REST API.

SynthDocker

class hazy_client2.drivers.synth_docker.SynthDocker(image: str, work_dir: Optional[Union[str, Path]] = None, docker_url: Optional[str] = None, container_user_default: Optional[str] = None, container_user_from_context_override: Optional[bool] = None, copy_io: bool = False, cleanup: bool = True, features_file: Optional[Union[str, Path]] = None, features_sig: Optional[Union[str, Path]] = None)

Bases: Synthesiser

Synth driver for running training and generation by executing local docker containers.

Example usage:

from hazy_client2 import SynthDocker
from hazy_configurator import TrainingConfig, GenerationConfig

synth = SynthDocker(
  image="hazy/multi-table:4.0.0"),
  features_file="features.json",
  features_sig="features.sig.json"
)
synth.train(cfg=TrainingConfig(...))
synth.generate(cfg=GenerationConfig(...))
Parameters:
  • work_dir (Union[Path, str, None]) – Location for storing working data and logs. If not specified, will default to a location based on XDG conventions.

  • docker_url (Optional[str]) – Url for local docker daemon.

  • container_user_default (Optional[str]) – Default container user.

  • container_user_from_context_override (Optional[bool]) – Ignore container_user_default and set container user from the calling context (current user)

  • copy_io (bool, optional) – Indicates whether or not to copy input and output files via a temporary directory. This may be required if mounting existing paths in to docker causes permissions problems.

  • cleanup (bool, optional) – Indicates whether or not to clear up container and temporary state files following execution. Used for debugging. Defaults to True.

  • features_file (Union[str, Path, None]) – License file provided by hazy indicating what features are available for use in synth

  • features_sig (Union[str, Path, None]) – Signature file verifiying feature file is authentic.

remove_containers() None

Removes all Hazy synthesiser containers. Conducts a force remove so any running containers will be stopped and removed.

generate(cfg: GenerationConfig, env: Optional[Dict[str, str]] = None, gen_schema_version: Optional[SchemaVersion] = None) None

Generates synthetic data from a Hazy model and saves as a csv file in the working directory.

Parameters:
  • cfg (GenerationConfig) – The configuration for the generation process

  • env (Dict[str, str], optional) – The map of environment variables that should be passed to the generation instance

  • gen_schema_version (Optional[SchemaVersion], optional) – In most cases the generation schema version is inferred automatically from the model file and does not need to be provided. In rare cases it may need to be overridden here when working with old models.

train(cfg: TrainingConfig, env: Optional[Dict[str, str]] = None) GeneratorModel

Train a new generator model using given configuration.

Parameters:
  • cfg (TrainingConfig) – The configuration for the training process.

  • env (Dict[str, str], optional) – The map of environment variables that should be passed to the training instance

Returns:

A new generator model that can be used to produce synthetic data.

Return type:

GeneratorModel

SynthAPI

class hazy_client2.drivers.synth_api.SynthAPI(host: str, api_key: Optional[str] = None)

API client that can be used to interact with the Hazy REST API.

SynthAPI can be used to:

  • begin and monitor training/generation jobs,

  • retrieve information about trained models,

  • retrieve information about projects,

  • query available data sources.

Example

Using the SynthAPI to train a model and use it for generation.

from hazy_client2 import SynthAPI
from hazy_configurator import TrainingConfig, GenerationConfig

# create a client instance and test the connection to the REST API
client = SynthAPI(...)

# start a training job with a provided configuration
train_job = client.jobs.train(config=TrainingConfig(...), project_id=1)

# poll the training job until complete (every 5 seconds)
for state in client.jobs.poll_training_status(train_job.model_id, interval=5):
    print(f"Job status {state}")
assert state.is_finished

# start a generation job with the trained model and a provided configuration
generate_job = client.jobs.generate(config=GenerationConfig(...), model_id=train_job.model_id)

# poll the generation job until complete (every 5 seconds)
for state in client.jobs.poll_generation_status(generate_job.run_id, interval=5):
    print(f"Job status {state}")
assert state.is_finished
Parameters:
  • host – URL of Hazy UI host

  • api_key – Personal API key - can be found on the dashboard page after logging into the UI.

jobs: Jobs

Python wrapper for the /api/jobs REST API resource.

models: Models

Python wrapper for the /api/models REST API resource.

projects: Projects

Python wrapper for the /api/projects REST API resource.

data_sources: DataSources

Python wrapper for the /api/data-sources REST API resource.

train(cfg: Optional[TrainingConfig] = None, project_id: Optional[int] = None, config_id: Optional[int] = None) dict

Train a new generator model using given configuration.

Warning

This method has been deprecated in favour of Jobs.train(), which returns a Pydantic object rather than a raw JSON response and can be accessed through the jobs object.

Parameters:
  • cfg – The configuration for the training process.

  • project_id – The ID of the project the model should be uploaded to after training is complete.

  • config_id – The ID of the configuration set for training.

Notes

  • This function accepts either project_id and cfg or only config_id.

  • If project_id and cfg are provided, the trained model will be uploaded to the specified project.

  • If config_id is provided, the specified configuration set will be used for training.

  • If project_id and config_id are provided, project_id will be ignored and the specified configuration set will be used for training.

Returns:

The response to the HTTP post request in a dictionary format

Return type:

dict

generate(cfg: GenerationConfig, model_id: str) dict

Generate a new batch of synthetic data using a given configuration.

Warning

This method has been deprecated in favour of Jobs.generate(), which returns a Pydantic object rather than a raw JSON response and can be accessed through the jobs object.

Parameters:
  • cfg – The configuration for the generation process.

  • model_id – The ID of the model that should be used to generate data.

Returns:

The response to the HTTP post request in a dictionary format

Return type:

dict

Endpoints

Jobs

train([config, project_id, config_id])

Train a new generator model using the given configuration.

get_training_job(model_id)

Retrieve a training job by model ID.

poll_training_status(model_id, *[, ...])

Polls for the status of a training job by model ID, until killed, failed or succeeded.

generate(config, model_id)

Generate a new batch of synthetic data using a given configuration and trained model.

get_generation_job(run_id)

Retrieve a generation job by run ID.

poll_generation_status(run_id, *[, ...])

Polls for the status of a generation job by run ID, until killed, failed or succeeded.

download_data(run_id, *, file[, chunk_size])

Download synthetic data in compressed format and write it to a file-like object.

class hazy_client2.drivers.api_resources.jobs.Jobs(*, api: SynthAPI)

Python wrapper for the /api/jobs REST API resource.

Warning

This class should not be instantiated directly.

The SynthAPI.jobs object should be used instead.

Example

Using the SynthAPI to:

  1. Create a training job.

  2. Poll the training job until complete.

  3. Create a generation job using the trained model.

  4. Poll the generation job until complete.

  5. Download the synthetic data from the generation job.

from tempfile import TemporaryFile, TemporaryDirectory
from zipfile import ZipFile

from hazy_client2 import SynthAPI
from hazy_configurator.api_types import TrainingConfig, GenerationConfig

# create a client instance and test the connection to the REST API
client = SynthAPI(...)

# start a training job with a provided configuration
train_job = client.jobs.train(config=TrainingConfig(...), project_id=1)

# poll the training job until complete (every 5 seconds)
for state in client.jobs.poll_training_status(train_job.model_id, interval=5):
    print(f"Job status {state}")
assert state.is_finished

# start a generation job with the trained model and a provided configuration
generate_job = client.jobs.generate(config=GenerationConfig(...), model_id=train_job.model_id)

# poll the generation job until complete (every 5 seconds)
for state in client.jobs.poll_generation_status(generate_job.run_id, interval=5):
    print(f"Job status {state}")
assert state.is_finished

# download and extract synth data to a temporary folder
with TemporaryFile() as file, TemporaryDirectory() as data_dir:
    client.jobs.download_data(generate_job.run_id, file=file)
    with ZipFile(file) as zip:
        zip.extractall(data_dir)
train(config: Optional[TrainingConfig] = None, project_id: Optional[int] = None, config_id: Optional[int] = None) TrainJobDetails

Train a new generator model using the given configuration.

Wraps POST /api/jobs/train.

Parameters:
  • config – The configuration for the training process.

  • project_id – The ID of the project the model should be uploaded to after training is complete.

  • config_id – The ID of the configuration set for training.

Return type:

Training job details.

Notes

  • This function accepts either project_id and config, or only config_id.

  • If project_id and config are provided, the trained model will be uploaded to the specified project.

  • If config_id is provided, the specified configuration set will be used for training.

  • If project_id and config_id are provided, project_id will be ignored and the specified configuration set will be used for training.

Example

>>> from hazy_client2 import SynthAPI
>>> from hazy_configurator.api_types import TrainingConfig
>>> client = SynthAPI(...)
>>> job_details = client.jobs.train(config=TrainingConfig(...), project_id=1)
get_training_job(model_id: UUID) TrainJob

Retrieve a training job by model ID.

Wraps GET /api/jobs/train/{model_id}.

Parameters:

model_id – Model ID.

Return type:

Training job with the matching model ID.

Example

>>> from hazy_client2 import SynthAPI
>>> from hazy_configurator.api_types import TrainingConfig
>>> client = SynthAPI(...)
>>> job_details = client.jobs.train(config=TrainingConfig(...), project_id=1)
>>> job = client.jobs.get_training_status(job_details.model_id)
poll_training_status(model_id: UUID, *, interval: PositiveFloat = 1.0, max_attempts: Optional[PositiveInt] = None) Iterator[DispatchTaskState]

Polls for the status of a training job by model ID, until killed, failed or succeeded.

Wraps GET /api/jobs/train/{model_id}.

Parameters:
  • model_id – Model ID.

  • interval – Polling interval (in seconds).

  • max_attempts – Maximum number of times to poll for.

Yields:

State of the training dispatch task.

Raises:

TimeoutError – If the number of times polled has reached max_attempts if it is set.

Example

>>> from hazy_client2 import SynthAPI
>>> from hazy_configurator.api_types import DispatchTaskState, TrainingConfig
>>> client = SynthAPI(...)
>>> job = client.jobs.train(config=TrainingConfig(...), project_id=1)
>>> for state in client.jobs.poll_training_status(job.model_id, interval=5):
>>>     print(f"Current state: {state.value}")
>>> assert state.is_finished
>>> if state == DispatchTaskState.SUCCEEDED:
>>>     print("Training complete!")
generate(config: GenerationConfig, model_id: UUID) GenerateJobDetails

Generate a new batch of synthetic data using a given configuration and trained model.

Wraps POST /api/jobs/generate.

Parameters:
  • config – Generation configuration.

  • model_id – ID of trained model to use for generation.

Return type:

Generation job details.

Example

>>> import uuid
>>> from hazy_client2 import SynthAPI
>>> from hazy_configurator.api_types import GenerationConfig
>>> model_id = uuid.uuid4()
>>> client = SynthAPI(...)
>>> job_details = client.jobs.generate(config=GenerationConfig(...), model_id=model_id)
get_generation_job(run_id: UUID) GenerateJob

Retrieve a generation job by run ID.

Wraps GET /api/jobs/generate/{generation_run_id}.

Parameters:

run_id – Generation run ID.

Return type:

Generation job with the matching run ID.

Example

>>> import uuid
>>> from hazy_client2 import SynthAPI
>>> from hazy_configurator.api_types import GenerationConfig
>>> model_id = uuid.uuid4()
>>> client = SynthAPI(...)
>>> job_details = client.jobs.generate(config=GenerationConfig(...), model_id=model_id)
>>> job = client.jobs.get_generation_job(job_details.run_id)
poll_generation_status(run_id: UUID, *, interval: PositiveFloat = 1.0, max_attempts: Optional[PositiveInt] = None) Iterator[DispatchTaskState]

Polls for the status of a generation job by run ID, until killed, failed or succeeded.

Wraps GET /api/jobs/generate/{generation_run_id}.

Parameters:
  • run_id – Generation run ID.

  • interval – Polling interval (in seconds).

  • max_attempts – Maximum number of times to poll for.

Yields:

State of the generation dispatch task.

Raises:

TimeoutError – If the number of times polled has reached max_attempts if it is set.

Example

>>> import uuid
>>> from hazy_client2 import SynthAPI
>>> from hazy_configurator.api_types import DispatchTaskState, GenerationConfig
>>> model_id = uuid.uuid4()
>>> client = SynthAPI(...)
>>> job = client.jobs.generate(config=GenerationConfig(...), model_id=model_id)
>>> for state in client.jobs.poll_generation_status(job.run_id, interval=5):
>>>     print(f"Current state: {state.value}")
>>> assert state.is_finished
>>> if state == DispatchTaskState.SUCCEEDED:
>>>     print("Generation complete!")
download_data(run_id: UUID, *, file: Any, chunk_size: int = 1048576) None

Download synthetic data in compressed format and write it to a file-like object.

Wraps GET /api/jobs/generate/{generation_run_id}/zip.

Parameters:
  • run_id – Generation run ID.

  • file – File-like object.

  • chunk_size – Size of write chunks.

Example

>>> import uuid
>>> from zipfile import ZipFile
>>> from tempfile import TemporaryFile, TemporaryDirectory
>>> from hazy_client2 import SynthAPI
>>> from hazy_configurator.api_types import DispatchTaskState, GenerationConfig
>>> client = SynthAPI(...)
>>> job = client.jobs.generate(config=GenerationConfig(...), model_id=model_id)
>>> for state in client.jobs.poll_generation_status(job.run_id, interval=5):
>>>     print(f"Job status {state}")
>>> assert state.is_finished
>>> assert state == DispatchTaskState.SUCCEEDED
>>> with TemporaryFile() as file, TemporaryDirectory() as data_dir:
>>>     client.jobs.download_data(job.run_id, file=file)
>>>     with ZipFile(file) as zip:
>>>         zip.extractall(data_dir)

Models

get(model_id)

Retrieve a model by ID.

get_training_metadata(model_id)

Retrieve training metadata by model ID.

class hazy_client2.drivers.api_resources.models.Models(*, api: SynthAPI)

Python wrapper for the /api/models REST API resource.

Warning

This class should not be instantiated directly.

The SynthAPI.models object should be used instead.

Example

Using the SynthAPI to retrieve a trained model by ID.

from uuid import UUID
from hazy_client2 import SynthAPI

# create a client instance and test the connection to the REST API
client = SynthAPI(...)

# retrieve model by ID
model_id = UUID(...)
project = client.models.get(model_id=model_id)
get(model_id: UUID) Model

Retrieve a model by ID.

Wraps GET /api/models/{model_id}.

Parameters:

model_id – Model ID.

Return type:

Model with the matching ID.

Example

>>> import uuid
>>> from hazy_client2 import SynthAPI
>>> model_id = uuid.uuid4()
>>> client = SynthAPI(...)
>>> model = client.models.get(model_id)
get_training_metadata(model_id: UUID) TrainMetadata

Retrieve training metadata by model ID.

Wraps GET /api/models/{model_id}/data.

Parameters:

model_id – Model ID.

Return type:

Training metadata for the specified model.

Example

>>> import uuid
>>> from hazy_client2 import SynthAPI
>>> model_id = uuid.uuid()
>>> client = SynthAPI(...)
>>> metadata = client.models.get_training_metadata(model_id)

Projects

get(project_id)

Retrieve a project by ID.

class hazy_client2.drivers.api_resources.projects.Projects(*, api: SynthAPI)

Python wrapper for the /api/projects REST API resource.

Warning

This class should not be instantiated directly.

The SynthAPI.projects object should be used instead.

Example

Using the SynthAPI to retrieve a project by ID.

from hazy_client2 import SynthAPI

# create a client instance and test the connection to the REST API
client = SynthAPI(...)

# retrieve project with ID 1
project = client.projects.get(project_id=1)
get(project_id: int) Project

Retrieve a project by ID.

Wraps GET /api/projects/{project_id}.

Parameters:

project_id – Project ID.

Return type:

Project with the matching ID.

Example

>>> from hazy_client2 import SynthAPI
>>> client = SynthAPI(...)
>>> project = client.projects.get(project_id=1)

Data Sources

get(data_source_id)

Retrieve a data source by ID.

filter_by(*[, source_type, io])

Retrieve data sources that match the provded query parameters.

all()

Retrieve all data sources.

get_download_source()

Retrieve the download data source if available.

class hazy_client2.drivers.api_resources.data_sources.DataSources(*, api: SynthAPI)

Python wrapper for the /api/data-sources REST API resource.

Warning

This class should not be instantiated directly.

The SynthAPI.data_sources object should be used instead.

Example

Using the SynthAPI to query S3 or Azure blob storage download data sources.

from hazy_client2 import SynthAPI
from hazy_configurator.api_types import DataSourceIO, SensitiveDataSourceType

# create a client instance and test the connection to the REST API
client = SynthAPI(...)

# fetch S3/azure download data sources
sources = client.data_sources.filter_by(
    source_type=[SensitiveDataSourceType.S3, SensitiveDataSourceType.AZURE],
    io=[DataSourceIO.DOWNLOAD],
)
get(data_source_id: UUID) SecretDataSource

Retrieve a data source by ID.

Wraps GET /api/data-sources/{data_source_id}.

Parameters:

data_source_id – Model ID.

Return type:

Data source with the matching ID.

Example

>>> import uuid
>>> from hazy_client2 import SynthAPI
>>> data_source_id = uuid.uuid4()
>>> client = SynthAPI(...)
>>> source = client.data_sources.get(data_source_id)
filter_by(*, source_type: Optional[List[SensitiveDataSourceType]] = None, io: Optional[List[DataSourceIO]] = None) List[SecretDataSource]

Retrieve data sources that match the provded query parameters.

Wraps GET /api/data-sources?source_type=[...]&io=[...].

Parameters:
  • source_type – Data source types.

  • io – Input/output types.

Return type:

Matching data sources.

Example

>>> import typing
>>> from hazy_client2 import SynthAPI
>>> from hazy_configurator.api_types import DataSourceIO, SensitiveDataSourceType
>>> client = SynthAPI(...)
>>> sources = client.data_sources.filter_by(
>>>     source_type=[SensitiveDataSourceType.S3],
>>>     io=[DataSourceIO.INPUT, DataSourceIO.INPUT_OUTPUT],
>>> )
all() List[SecretDataSource]

Retrieve all data sources.

Wraps GET /api/data-sources.

Return type:

All data sources.

Example

>>> from hazy_client2 import SynthAPI
>>> client = SynthAPI(...)
>>> sources = client.data_sources.all()
get_download_source() SecretDataSource

Retrieve the download data source if available.

Wraps GET /api/data-sources?io=download.

Return type:

The download data source.

Example

>>> form hazy_client2 import SynthAPI
>>> client = SynthAPI(...)
>>> download_source = client.data_sources.get_download_source()