Synthesiser¶
|
Synth driver for running training and generation by executing local docker containers. |
|
API client that can be used to interact with the Hazy REST API. |
SynthDocker¶
- class hazy_client2.drivers.synth_docker.SynthDocker(image: str, work_dir: Optional[Union[str, Path]] = None, docker_url: Optional[str] = None, container_user_default: Optional[str] = None, container_user_from_context_override: Optional[bool] = None, copy_io: bool = False, cleanup: bool = True, features_file: Optional[Union[str, Path]] = None, features_sig: Optional[Union[str, Path]] = None)¶
Bases:
Synthesiser
Synth driver for running training and generation by executing local docker containers.
Example usage:
from hazy_client2 import SynthDocker from hazy_configurator import TrainingConfig, GenerationConfig synth = SynthDocker( image="hazy/multi-table:4.0.0"), features_file="features.json", features_sig="features.sig.json" ) synth.train(cfg=TrainingConfig(...)) synth.generate(cfg=GenerationConfig(...))
- Parameters:
work_dir (Union[Path, str, None]) – Location for storing working data and logs. If not specified, will default to a location based on XDG conventions.
docker_url (Optional[str]) – Url for local docker daemon.
container_user_default (Optional[str]) – Default container user.
container_user_from_context_override (Optional[bool]) – Ignore container_user_default and set container user from the calling context (current user)
copy_io (bool, optional) – Indicates whether or not to copy input and output files via a temporary directory. This may be required if mounting existing paths in to docker causes permissions problems.
cleanup (bool, optional) – Indicates whether or not to clear up container and temporary state files following execution. Used for debugging. Defaults to True.
features_file (Union[str, Path, None]) – License file provided by hazy indicating what features are available for use in synth
features_sig (Union[str, Path, None]) – Signature file verifiying feature file is authentic.
- remove_containers() None ¶
Removes all Hazy synthesiser containers. Conducts a force remove so any running containers will be stopped and removed.
- generate(cfg: GenerationConfig, env: Optional[Dict[str, str]] = None, gen_schema_version: Optional[SchemaVersion] = None) None ¶
Generates synthetic data from a Hazy model and saves as a csv file in the working directory.
- Parameters:
cfg (GenerationConfig) – The configuration for the generation process
env (Dict[str, str], optional) – The map of environment variables that should be passed to the generation instance
gen_schema_version (Optional[SchemaVersion], optional) – In most cases the generation schema version is inferred automatically from the model file and does not need to be provided. In rare cases it may need to be overridden here when working with old models.
- train(cfg: TrainingConfig, env: Optional[Dict[str, str]] = None) GeneratorModel ¶
Train a new generator model using given configuration.
- Parameters:
cfg (TrainingConfig) – The configuration for the training process.
env (Dict[str, str], optional) – The map of environment variables that should be passed to the training instance
- Returns:
A new generator model that can be used to produce synthetic data.
- Return type:
SynthAPI¶
- class hazy_client2.drivers.synth_api.SynthAPI(host: str, api_key: Optional[str] = None)¶
API client that can be used to interact with the Hazy REST API.
SynthAPI
can be used to:begin and monitor training/generation jobs,
retrieve information about trained models,
retrieve information about projects,
query available data sources.
Example
Using the
SynthAPI
to train a model and use it for generation.from hazy_client2 import SynthAPI from hazy_configurator import TrainingConfig, GenerationConfig # create a client instance and test the connection to the REST API client = SynthAPI(...) # start a training job with a provided configuration train_job = client.jobs.train(config=TrainingConfig(...), project_id=1) # poll the training job until complete (every 5 seconds) for state in client.jobs.poll_training_status(train_job.model_id, interval=5): print(f"Job status {state}") assert state.is_finished # start a generation job with the trained model and a provided configuration generate_job = client.jobs.generate(config=GenerationConfig(...), model_id=train_job.model_id) # poll the generation job until complete (every 5 seconds) for state in client.jobs.poll_generation_status(generate_job.run_id, interval=5): print(f"Job status {state}") assert state.is_finished
- Parameters:
host – URL of Hazy UI host
api_key – Personal API key - can be found on the dashboard page after logging into the UI.
- data_sources: DataSources¶
Python wrapper for the
/api/data-sources
REST API resource.
- train(cfg: Optional[TrainingConfig] = None, project_id: Optional[int] = None, config_id: Optional[int] = None) dict ¶
Train a new generator model using given configuration.
Warning
This method has been deprecated in favour of
Jobs.train()
, which returns a Pydantic object rather than a raw JSON response and can be accessed through thejobs
object.- Parameters:
cfg – The configuration for the training process.
project_id – The ID of the project the model should be uploaded to after training is complete.
config_id – The ID of the configuration set for training.
Notes
This function accepts either
project_id
andcfg
or onlyconfig_id
.If
project_id
andcfg
are provided, the trained model will be uploaded to the specified project.If
config_id
is provided, the specified configuration set will be used for training.If
project_id
andconfig_id
are provided,project_id
will be ignored and the specified configuration set will be used for training.
- Returns:
The response to the HTTP post request in a dictionary format
- Return type:
- generate(cfg: GenerationConfig, model_id: str) dict ¶
Generate a new batch of synthetic data using a given configuration.
Warning
This method has been deprecated in favour of
Jobs.generate()
, which returns a Pydantic object rather than a raw JSON response and can be accessed through thejobs
object.- Parameters:
cfg – The configuration for the generation process.
model_id – The ID of the model that should be used to generate data.
- Returns:
The response to the HTTP post request in a dictionary format
- Return type:
Endpoints¶
Jobs¶
|
Train a new generator model using the given configuration. |
|
Retrieve a training job by model ID. |
|
Polls for the status of a training job by model ID, until killed, failed or succeeded. |
|
Generate a new batch of synthetic data using a given configuration and trained model. |
|
Retrieve a generation job by run ID. |
|
Polls for the status of a generation job by run ID, until killed, failed or succeeded. |
|
Download synthetic data in compressed format and write it to a file-like object. |
- class hazy_client2.drivers.api_resources.jobs.Jobs(*, api: SynthAPI)¶
Python wrapper for the
/api/jobs
REST API resource.Warning
This class should not be instantiated directly.
The
SynthAPI.jobs
object should be used instead.Example
Using the
SynthAPI
to:Create a training job.
Poll the training job until complete.
Create a generation job using the trained model.
Poll the generation job until complete.
Download the synthetic data from the generation job.
from tempfile import TemporaryFile, TemporaryDirectory from zipfile import ZipFile from hazy_client2 import SynthAPI from hazy_configurator.api_types import TrainingConfig, GenerationConfig # create a client instance and test the connection to the REST API client = SynthAPI(...) # start a training job with a provided configuration train_job = client.jobs.train(config=TrainingConfig(...), project_id=1) # poll the training job until complete (every 5 seconds) for state in client.jobs.poll_training_status(train_job.model_id, interval=5): print(f"Job status {state}") assert state.is_finished # start a generation job with the trained model and a provided configuration generate_job = client.jobs.generate(config=GenerationConfig(...), model_id=train_job.model_id) # poll the generation job until complete (every 5 seconds) for state in client.jobs.poll_generation_status(generate_job.run_id, interval=5): print(f"Job status {state}") assert state.is_finished # download and extract synth data to a temporary folder with TemporaryFile() as file, TemporaryDirectory() as data_dir: client.jobs.download_data(generate_job.run_id, file=file) with ZipFile(file) as zip: zip.extractall(data_dir)
- train(config: Optional[TrainingConfig] = None, project_id: Optional[int] = None, config_id: Optional[int] = None) TrainJobDetails ¶
Train a new generator model using the given configuration.
Wraps
POST /api/jobs/train
.- Parameters:
config – The configuration for the training process.
project_id – The ID of the project the model should be uploaded to after training is complete.
config_id – The ID of the configuration set for training.
- Return type:
Training job details.
Notes
This function accepts either
project_id
andconfig
, or onlyconfig_id
.If
project_id
andconfig
are provided, the trained model will be uploaded to the specified project.If
config_id
is provided, the specified configuration set will be used for training.If
project_id
andconfig_id
are provided,project_id
will be ignored and the specified configuration set will be used for training.
Example
>>> from hazy_client2 import SynthAPI >>> from hazy_configurator.api_types import TrainingConfig >>> client = SynthAPI(...) >>> job_details = client.jobs.train(config=TrainingConfig(...), project_id=1)
- get_training_job(model_id: UUID) TrainJob ¶
Retrieve a training job by model ID.
Wraps
GET /api/jobs/train/{model_id}
.- Parameters:
model_id – Model ID.
- Return type:
Training job with the matching model ID.
Example
>>> from hazy_client2 import SynthAPI >>> from hazy_configurator.api_types import TrainingConfig >>> client = SynthAPI(...) >>> job_details = client.jobs.train(config=TrainingConfig(...), project_id=1) >>> job = client.jobs.get_training_status(job_details.model_id)
- poll_training_status(model_id: UUID, *, interval: PositiveFloat = 1.0, max_attempts: Optional[PositiveInt] = None) Iterator[DispatchTaskState] ¶
Polls for the status of a training job by model ID, until killed, failed or succeeded.
Wraps
GET /api/jobs/train/{model_id}
.- Parameters:
model_id – Model ID.
interval – Polling interval (in seconds).
max_attempts – Maximum number of times to poll for.
- Yields:
State of the training dispatch task.
- Raises:
TimeoutError – If the number of times polled has reached
max_attempts
if it is set.
Example
>>> from hazy_client2 import SynthAPI >>> from hazy_configurator.api_types import DispatchTaskState, TrainingConfig >>> client = SynthAPI(...) >>> job = client.jobs.train(config=TrainingConfig(...), project_id=1) >>> for state in client.jobs.poll_training_status(job.model_id, interval=5): >>> print(f"Current state: {state.value}") >>> assert state.is_finished >>> if state == DispatchTaskState.SUCCEEDED: >>> print("Training complete!")
- generate(config: GenerationConfig, model_id: UUID) GenerateJobDetails ¶
Generate a new batch of synthetic data using a given configuration and trained model.
Wraps
POST /api/jobs/generate
.- Parameters:
config – Generation configuration.
model_id – ID of trained model to use for generation.
- Return type:
Generation job details.
Example
>>> import uuid >>> from hazy_client2 import SynthAPI >>> from hazy_configurator.api_types import GenerationConfig >>> model_id = uuid.uuid4() >>> client = SynthAPI(...) >>> job_details = client.jobs.generate(config=GenerationConfig(...), model_id=model_id)
- get_generation_job(run_id: UUID) GenerateJob ¶
Retrieve a generation job by run ID.
Wraps
GET /api/jobs/generate/{generation_run_id}
.- Parameters:
run_id – Generation run ID.
- Return type:
Generation job with the matching run ID.
Example
>>> import uuid >>> from hazy_client2 import SynthAPI >>> from hazy_configurator.api_types import GenerationConfig >>> model_id = uuid.uuid4() >>> client = SynthAPI(...) >>> job_details = client.jobs.generate(config=GenerationConfig(...), model_id=model_id) >>> job = client.jobs.get_generation_job(job_details.run_id)
- poll_generation_status(run_id: UUID, *, interval: PositiveFloat = 1.0, max_attempts: Optional[PositiveInt] = None) Iterator[DispatchTaskState] ¶
Polls for the status of a generation job by run ID, until killed, failed or succeeded.
Wraps
GET /api/jobs/generate/{generation_run_id}
.- Parameters:
run_id – Generation run ID.
interval – Polling interval (in seconds).
max_attempts – Maximum number of times to poll for.
- Yields:
State of the generation dispatch task.
- Raises:
TimeoutError – If the number of times polled has reached
max_attempts
if it is set.
Example
>>> import uuid >>> from hazy_client2 import SynthAPI >>> from hazy_configurator.api_types import DispatchTaskState, GenerationConfig >>> model_id = uuid.uuid4() >>> client = SynthAPI(...) >>> job = client.jobs.generate(config=GenerationConfig(...), model_id=model_id) >>> for state in client.jobs.poll_generation_status(job.run_id, interval=5): >>> print(f"Current state: {state.value}") >>> assert state.is_finished >>> if state == DispatchTaskState.SUCCEEDED: >>> print("Generation complete!")
- download_data(run_id: UUID, *, file: Any, chunk_size: int = 1048576) None ¶
Download synthetic data in compressed format and write it to a file-like object.
Wraps
GET /api/jobs/generate/{generation_run_id}/zip
.- Parameters:
run_id – Generation run ID.
file – File-like object.
chunk_size – Size of write chunks.
Example
>>> import uuid >>> from zipfile import ZipFile >>> from tempfile import TemporaryFile, TemporaryDirectory >>> from hazy_client2 import SynthAPI >>> from hazy_configurator.api_types import DispatchTaskState, GenerationConfig >>> client = SynthAPI(...) >>> job = client.jobs.generate(config=GenerationConfig(...), model_id=model_id) >>> for state in client.jobs.poll_generation_status(job.run_id, interval=5): >>> print(f"Job status {state}") >>> assert state.is_finished >>> assert state == DispatchTaskState.SUCCEEDED >>> with TemporaryFile() as file, TemporaryDirectory() as data_dir: >>> client.jobs.download_data(job.run_id, file=file) >>> with ZipFile(file) as zip: >>> zip.extractall(data_dir)
Models¶
|
Retrieve a model by ID. |
|
Retrieve training metadata by model ID. |
- class hazy_client2.drivers.api_resources.models.Models(*, api: SynthAPI)¶
Python wrapper for the
/api/models
REST API resource.Warning
This class should not be instantiated directly.
The
SynthAPI.models
object should be used instead.Example
Using the
SynthAPI
to retrieve a trained model by ID.from uuid import UUID from hazy_client2 import SynthAPI # create a client instance and test the connection to the REST API client = SynthAPI(...) # retrieve model by ID model_id = UUID(...) project = client.models.get(model_id=model_id)
- get(model_id: UUID) Model ¶
Retrieve a model by ID.
Wraps
GET /api/models/{model_id}
.- Parameters:
model_id – Model ID.
- Return type:
Model with the matching ID.
Example
>>> import uuid >>> from hazy_client2 import SynthAPI >>> model_id = uuid.uuid4() >>> client = SynthAPI(...) >>> model = client.models.get(model_id)
- get_training_metadata(model_id: UUID) TrainMetadata ¶
Retrieve training metadata by model ID.
Wraps
GET /api/models/{model_id}/data
.- Parameters:
model_id – Model ID.
- Return type:
Training metadata for the specified model.
Example
>>> import uuid >>> from hazy_client2 import SynthAPI >>> model_id = uuid.uuid() >>> client = SynthAPI(...) >>> metadata = client.models.get_training_metadata(model_id)
Projects¶
|
Retrieve a project by ID. |
- class hazy_client2.drivers.api_resources.projects.Projects(*, api: SynthAPI)¶
Python wrapper for the
/api/projects
REST API resource.Warning
This class should not be instantiated directly.
The
SynthAPI.projects
object should be used instead.Example
Using the
SynthAPI
to retrieve a project by ID.from hazy_client2 import SynthAPI # create a client instance and test the connection to the REST API client = SynthAPI(...) # retrieve project with ID 1 project = client.projects.get(project_id=1)
Data Sources¶
|
Retrieve a data source by ID. |
|
Retrieve data sources that match the provded query parameters. |
|
Retrieve all data sources. |
Retrieve the download data source if available. |
- class hazy_client2.drivers.api_resources.data_sources.DataSources(*, api: SynthAPI)¶
Python wrapper for the
/api/data-sources
REST API resource.Warning
This class should not be instantiated directly.
The
SynthAPI.data_sources
object should be used instead.Example
Using the
SynthAPI
to query S3 or Azure blob storage download data sources.from hazy_client2 import SynthAPI from hazy_configurator.api_types import DataSourceIO, SensitiveDataSourceType # create a client instance and test the connection to the REST API client = SynthAPI(...) # fetch S3/azure download data sources sources = client.data_sources.filter_by( source_type=[SensitiveDataSourceType.S3, SensitiveDataSourceType.AZURE], io=[DataSourceIO.DOWNLOAD], )
- get(data_source_id: UUID) SecretDataSource ¶
Retrieve a data source by ID.
Wraps
GET /api/data-sources/{data_source_id}
.- Parameters:
data_source_id – Model ID.
- Return type:
Data source with the matching ID.
Example
>>> import uuid >>> from hazy_client2 import SynthAPI >>> data_source_id = uuid.uuid4() >>> client = SynthAPI(...) >>> source = client.data_sources.get(data_source_id)
- filter_by(*, source_type: Optional[List[SensitiveDataSourceType]] = None, io: Optional[List[DataSourceIO]] = None) List[SecretDataSource] ¶
Retrieve data sources that match the provded query parameters.
Wraps
GET /api/data-sources?source_type=[...]&io=[...]
.- Parameters:
source_type – Data source types.
io – Input/output types.
- Return type:
Matching data sources.
Example
>>> import typing >>> from hazy_client2 import SynthAPI >>> from hazy_configurator.api_types import DataSourceIO, SensitiveDataSourceType >>> client = SynthAPI(...) >>> sources = client.data_sources.filter_by( >>> source_type=[SensitiveDataSourceType.S3], >>> io=[DataSourceIO.INPUT, DataSourceIO.INPUT_OUTPUT], >>> )
- all() List[SecretDataSource] ¶
Retrieve all data sources.
Wraps
GET /api/data-sources
.- Return type:
All data sources.
Example
>>> from hazy_client2 import SynthAPI >>> client = SynthAPI(...) >>> sources = client.data_sources.all()
- get_download_source() SecretDataSource ¶
Retrieve the download data source if available.
Wraps
GET /api/data-sources?io=download
.- Return type:
The download data source.
Example
>>> form hazy_client2 import SynthAPI >>> client = SynthAPI(...) >>> download_source = client.data_sources.get_download_source()