Distributed arch. install

Overview

The problem Hazy solves is to give an organisation access to safe, synthetic data which is sufficiently close to the original to be equivalent for most use cases.

Hazy's Distributed architecture (DA) installations are deployed atop managed Kubernetes platforms (such as Amazon's EKS) to provide an elastically scalable and robust Hazy environment which can be tailored to the complex requirements of the modern enterprise data platform.

Rather than moving data out of the secure environment for training/generation, Hazy’s DA architecture allows for dispatching of remote tasks into the secure environment via a message queue and retrieval of Hazy's models from a shared object store (like AWS S3), ensuring no private data leaves the secure environment.

Pre-requisites

Hazy provide a helm-based installation workflow to provision a full synthetic data platform on Kubernetes.

Hazy's helm chart is compose of the following sub-charts:

Chart Description Required?
hub Hazy's centralized UI for managing synthetic data. (Note this is the same image as multi-table Yes
dispatcher Hazy's tasks dispatching service Yes
rabbitmq A message queue for hub<->dispatcher communication Yes
keycloak Authentication service to integrate with enterprise IdPs. Yes (if using auth)
postgresql A PostgreSQL database used by hub No
minio An AWS S3 compliant object store for logs, input & output blob storage No
nginx Nginx reverse proxy for managing hub/keycloak routing. No

By default, Hazy's chart will install the following:

In order to install Hazy's chart in your Kubernetes cluster, the following are required:

  • Hazy hub, dispatcher, keycloak and rabbitmq images pulled to local registry,
  • A PostgreSQL Database used for the UI,
  • Object storage bucket like AWS S3 buckets input data, model files, and generated data. Access to these buckets for Dispatcher and its Kubernetes Jobs can be restricted. Please refer to your specific object storage provider for details. (e.g. for S3 configuring Kubernetes ServiceAccounts to assume IAM roles),
  • A Kubernetes cluster with the following Kubernetes Resources:
    • Secret used by the PostgreSQL database
    • Role used to managed Kubernetes Jobs by the Hazy Dispatcher
    • Secret names hazy-encryption-config with 2 keys as described in the Single Container guide:
      • HAZY_DS_ROOT_KEY
      • HAZY_ANALYSIS_ENCRYPTION_KEY
  • An installation of the Helm CLI for the Kubernetes cluster.

Helm install

The Helm charts includes the top level Hazy Chart. This chart includes the following subcharts:

  • Hub chart - installing Hazy Hub UI service,
  • Dispatcher chart - installing Dispatcher, responsible for scheduling Kubernetes Jobs.
  • Installing a RabbitMQ message queue (dependency of Dispatcher, installed automatically).

The license files provided as part of install should be added during the Helm install as shown:

helm install hazy                                                \
  oci://release.hazy.com/hazy/helm                               \
  --version $hazy_version                                        \
  --set-file dispatcher.featuresJson=/path/to/__features__.json  \
  --set-file dispatcher.featuresSigJson=/path/to/__sig__.json    \
  -f values.yaml

The explicit Kubernetes resources created by helm install can be examined by running an equivalent helm template:

helm template hazy                                               \
  oci://release.hazy.com/hazy/helm                               \
  --version $hazy_version                                        \
  --set-file dispatcher.featuresJson=/path/to/__features__.json  \
  --set-file dispatcher.featuresSigJson=/path/to/__sig__.json    \
  --skip-tests                                                   \
  -f values.yaml

Note that --skip-tests excludes automated smoke tests from output of helm template. This is preferred when provisioning resources as tests should be run once the resources have become available on the Kubernetes cluster. A subsequent helm test after the cluster & applications states have steadied will execute these tests.

For further configuration, please refer to our DA: Further configuration page.