Architecture

Overview

Hazy is enterprise software comprised of multiple components that allow for seamless integration with your network and security infrastructure.

These components are installed into our customers' on-premise, private cloud or hybrid network.

Instead of transmitting data, Hazy's synthetic backbone deals entirely with Generator Models. This maintains data and network security because production data never leaves the trusted environment and our client-hub infrastructure allows for the generation of any amount of safe, synthetic data on-prem or in the cloud.

Components

A Hazy installation is comprised of multiple components, connected by Bridges that allow for extremely flexible deployments supporting multiple use-cases.

The key elements of this system are:

Hub

The Hub is the centre of your Hazy installation. At its core, it is a web application providing a user interface (UI) for configuration and a HTTP API that provides access to the trained generator models.

The main role of the Hub is to host and serve Generator Models. To do this it acts as a single point of configuration, allowing for a network of other Hazy components to communicate and coordinate between different network zones over Bridges.

Some key features include:

  • Detailed Metrics view detailed graphical metrics showing representative information about the similarity, utility and privacy of the synthetic data produced by individual Generator Models

  • Organisations structure your Generators and users into multiple internal Organisations

  • Role Based Access Control. This allows for fine-grained control over access to the Hub configuration.

  • Team based Access Control. This is the mechanism by which Hub admins grant access to Generator Models hosted on the Hub via the API

Trainer

The Trainer is the component that sits next to the production data and runs the Hazy Synthesiser code to train a new Generator Model.

Once trained this Generator Model is delivered back to the Hub over the Bridge and thus made available for synthetic data generation via the Hub's API.

Data Transfer

In order to provide access to trained Generator Models the Hub must run on the same network as the users who will be generating synthetic data.

The Trainer requires access to production data, so generally runs within some DMZ environment separate from the company WAN.

The mechanism by which the trained Generator Models are moved out of the DMZ containing the production data and into the more open network hosting the Hub is one of the core deployment decision Hazy customers must make.

We allow for two basic approaches: manual and automatic.

Manual Transfer

With manual transfer, responsibility for moving trained Generator Models out of the DMZ lies entirely with the customer.

  1. Generator Models are written to some storage component within the DMZ. This may be an NFS share, a Kubernetes Volume, OpenShift Object Storage, some binary repository such as JFrog Artefactory or similar.

  2. The customer then runs some kind of internal process -- automated or manual -- to audit the Models and transfer them out of the DMZ's network partition.

  3. The customer uploads the new Generator Models to the Hub, either using the Hub user interface or the provided API.

The benefit of this to the customer is that all data extraction from the DMZ is done manually or using customer-created workflows so no external (Hazy provided in this case) systems are involved in the movement from sensitive environment to general availability.

The downside is that the customer has to create the processes or workflows rather than relying on Hazy generated ones.

Automatic Transfer

With automatic transfer, the both the Trainer and Hub are configured with a shared Bridge application. Once trained, the Generator Models are placed directly into the Bridge by the Trainer and retrieved automatically by the Hub. No manual intervention is required once the Hub and Trainer are configured.

The advantage of this approach is that Generator Models flow seamlessly from the DMZ into the Hub and then onto users.

Bridges

With automatic transfer Hubs and Trainers communicate via "Bridges".

A Bridge is a communication channel that allows for data and messages to be transferred between components. This allows for different environments, or "zones" to communicate even in the absence of a direct network connection.

A Bridge can be implemented on-top of various data-storage applications and protocols, including:

  • NFS
  • WebDAV
  • JFrog Artefactory
  • Nexus Sonatype
  • Amazon Web Services S3,
  • Microsoft Azure Storage
  • Google Cloud Storage

By moving data over a secure data storage layer we remove the requirement for direct network connectivity between components and can therefore support many different security policies.

Nomenclature

  • Synthesiser

    A model training pipeline that ingests the source data and uses it to train a Generator Model. Packaged as a Docker/OCI container image.

  • Generator Model

    The serialised set of statistical properties of the source data, sufficient to re-create a synthetic version of the original.

  • Trainer

    The server (virtual or bare-metal) that runs the Hazy Synthesiser in order to train Generator Models from some source data.