Distributed arch. install
Overview¶
The problem Hazy solves is to give an organisation access to safe, synthetic data which is sufficiently close to the original to be equivalent for most use cases.
Hazy's Distributed architecture (DA) installations are deployed atop managed Kubernetes platforms (such as Amazon's EKS) to provide an elastically scalable and robust Hazy environment which can be tailored to the complex requirements of the modern enterprise data platform.
Rather than moving data out of the secure environment for training/generation, Hazy’s DA architecture allows for dispatching of remote tasks into the secure environment via a message queue and retrieval of Hazy's models from a shared object store (like AWS S3), ensuring no private data leaves the secure environment.
Pre-requisites¶
Hazy provide a helm-based installation workflow to provision a full synthetic data platform on Kubernetes.
Hazy's helm
chart is compose of the following sub-charts:
Chart | Description | Required? |
---|---|---|
hub |
Hazy's centralized UI for managing synthetic data. (Note this is the same image as multi-table |
Yes |
dispatcher |
Hazy's tasks dispatching service | Yes |
rabbitmq |
A message queue for hub <->dispatcher communication |
Yes |
keycloak |
Authentication service to integrate with enterprise IdPs. | Yes (if using auth) |
postgresql |
A PostgreSQL database used by hub |
No |
minio |
An AWS S3 compliant object store for logs, input & output blob storage | No |
nginx |
Nginx reverse proxy for managing hub/keycloak routing. | No |
By default, Hazy's chart will install the following:
In order to install Hazy's chart in your Kubernetes cluster, the following are required:
- Hazy
hub
,dispatcher
,keycloak
andrabbitmq
images pulled to local registry, - A PostgreSQL Database used for the UI,
- Object storage bucket like AWS S3 buckets input data, model files, and generated data. Access to these buckets for
Dispatcher
and its KubernetesJobs
can be restricted. Please refer to your specific object storage provider for details. (e.g. for S3 configuring Kubernetes ServiceAccounts to assume IAM roles), - A Kubernetes cluster with the following Kubernetes Resources:
Secret
used by the PostgreSQL databaseRole
used to managed Kubernetes Jobs by the Hazy DispatcherSecret
nameshazy-encryption-config
with 2 keys as described in the Single Container guide:HAZY_DS_ROOT_KEY
HAZY_ANALYSIS_ENCRYPTION_KEY
- An installation of the Helm CLI for the Kubernetes cluster.
Helm install¶
The Helm charts includes the top level Hazy Chart. This chart includes the following subcharts:
- Hub chart - installing Hazy Hub UI service,
- Dispatcher chart - installing
Dispatcher
, responsible for scheduling KubernetesJobs
. - Installing a RabbitMQ message queue (dependency of
Dispatcher
, installed automatically).
The license files provided as part of install should be added during the Helm install as shown:
helm install hazy \
oci://release.hazy.com/hazy/helm \
--version $hazy_version \
--set-file dispatcher.featuresJson=/path/to/__features__.json \
--set-file dispatcher.featuresSigJson=/path/to/__sig__.json \
-f values.yaml
The explicit Kubernetes resources created by helm install
can be examined by running an equivalent helm template
:
helm template hazy \
oci://release.hazy.com/hazy/helm \
--version $hazy_version \
--set-file dispatcher.featuresJson=/path/to/__features__.json \
--set-file dispatcher.featuresSigJson=/path/to/__sig__.json \
--skip-tests \
-f values.yaml
Note that --skip-tests
excludes automated smoke tests from output of helm template
. This is preferred when provisioning
resources as tests should be run once the resources have become available on the Kubernetes cluster. A subsequent helm test
after the cluster & applications states have steadied will execute these tests.
For further configuration, please refer to our DA: Further configuration page.