Hub installation

The Hazy Hub is provided as Docker (or OCI) container images in order to simplify their installation and use.

To function the Hub must be on the same network as the users requiring access to the Generator Models.

The following assumes a server that meets the minimum specification described in the requirements.

Running Standalone

The Hub requires two data volumes to be mounted into the container. These can either be Docker volumes or bind mounts.

It also supports running with a read-only root filesystem for improved security.

Using Docker

You may need to be root to run root-full docker containers (that is when the Docker daemon is configured to run as root and the default owner of any container process is therefore also root - this is the default for new Docker installations). It is possible (and often preferable) to configure Docker to enable non-root access to the daemon.'

An example docker run command suitable for running the hub:

docker run \
  --name hub \
  --restart always \
  --detach \
  # bind to local host (see note [1])
  -p 127.0.0.1:4000:4000 \ 
  # database volume (see note [2])
  -v hazy_hub_db:/var/db \
  # static data volume (see note [3])
  -v hazy_hub_data:/mnt/data \
  # optional hub configuration file (see note [4])
  -v /etc/hazy/hub.yaml:/etc/hazy/hub.yaml \
  # security features (see note [5])
  --read-only \
  --tmpfs /tmp:rw,size=64g \
  --tmpfs /var/run:rw,exec \
  --security-opt=no-new-privileges \
  --cap-drop ALL \
  --cap-add CHOWN \
  --cap-add DAC_OVERRIDE \
  --cap-add FOWNER \
  --cap-add FSETID \
  --cap-add KILL \
  --cap-add SETGID \
  --cap-add SETUID \
  # define the deployment environment (see note [6])
  -e HOST=hub.northwindtraders.com \ 
  hazy-hub:latest

Notes

  • [1] Hazy does not provide a TLS connection so you should bind to localhost and then use an HTTP proxy integrated into your Public Key Infrastructure

  • [2] The volume mounted at /var/db (in this case a Docker volume named hazy_hub_db) is used by the Hub's internal database server and thus is not suitable for filesystem- (or block-) level backups.

  • [3] The volume mounted at /mnt/data (in this case a Docker volume named hazy_hub_data) holds any file data uploaded to the Hub (e.g. Generator Models) and also a constantly updated snapshot of the database state. This volumne is suitable for filesystem backup and should be backed up regularly to ensure data integrity in the case of a system failure.

  • [4] Mount in a Hub configuration file

  • [5] Enable various Docker features that increase the security of your installation, see Security

  • [6] The environment variable HOST helps the Hub to understand its deployment environment. HOST should be set to the external hostname that users should use to access the Hub UI.

Using Podman

Podman works well as a container runtime for the Hazy Hub.

podman run \
  --name hub \
  --restart always \
  --detach \
  -p 127.0.0.1:4000:4000 \
  -v hazy_hub_db:/var/db \
  -v hazy_hub_data:/mnt/data \
  -v /etc/hazy/hub.yaml:"/etc/hazy/hub.yaml \
  --read-only \
  --tmpfs /tmp:rw \
  --tmpfs /var/run:rw,exec \
  --security-opt=no-new-privileges \
  --cap-drop ALL \
  --cap-add CHOWN \
  --cap-add DAC_OVERRIDE \
  --cap-add FOWNER \
  --cap-add FSETID \
  --cap-add KILL \
  --cap-add SETGID \
  --cap-add SETUID \
  # integrate with podman's healthcheck system (see notes [1])
  --health-cmd "/usr/local/bin/hub-healthcheck" \
  --health-interval "30s" \
  --health-start-period "60s" \
  -e HOST=hub.local \ 
  hazy-hub:20210326110413

Podman Run Parameters

See notes for docker run for an explanation of the parameters shared between Docker and Podman.

  • [1] Docker has healthchecks built into it, and the Hazy docker image is configured to probe the status of the Hub container every 30s. For podman you must configure the healthchecks externally as part of the podman run arguments.

Running under Kubernetes

Below is an example deployment configuration for the Hub as a Kubernetes:

---
apiVersion: v1
kind: Namespace
metadata:
  name: hazy

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: hazy-hub
  name: hazy-hub
  namespace: hazy
spec:
  type: LoadBalancer
  selector:
    app: hazy-hub
  ports:
    - protocol: "TCP"
      port: 4000
      targetPort: 4000

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app: hazy-hub
  name: hazy-hub-data
  namespace: hazy
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 256Gi


---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app: hazy-hub
  name: hazy-hub-db
  namespace: hazy
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 256Gi

---
apiVersion: apps/v1
kind: ReplicaSet
metadata:
  labels:
    app: hazy-hub
  name: hazy-hub
  namespace: hazy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hazy-hub
  template:
    metadata:
      labels:
        app: hazy-hub
    spec:
      containers:
        - name: hazy-hub
          image: registry.northwindtraders.com/hazy/hub:latest
          imagePullPolicy: Always

          ports:
            - containerPort: 4000

          volumeMounts:
            - name: hub-data
              mountPath: /mnt/data
            - name: hub-db
              mountPath: /var/db

          startupProbe:
            exec:
              command:
                - '/usr/local/bin/hub-healthcheck'
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 20

          livenessProbe:
            exec:
              command:
                - '/usr/local/bin/hub-healthcheck'
            initialDelaySeconds: 60
            periodSeconds: 30
            timeoutSeconds: 3
            successThreshold: 1
            failureThreshold: 3

          lifecycle:
            preStop:
              exec:
                command: ["/bin/s6-svc", "-d", "/var/run/s6/services/postgresql"]

          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["ALL"]
              add: ["CHOWN", "DAC_OVERRIDE", "FOWNER", "FSETID", "KILL", "SETGID", "SETUID"]

      volumes:
        - name: hub-data
          persistentVolumeClaim:
            claimName: hazy-hub-data

        - name: hub-db
          persistentVolumeClaim:
            claimName: hazy-hub-db

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  labels:
    app: hazy-hub
  name: hazy-hub-ingress
  namespace: hazy
spec:
  rules:
    - host: hub.northwindtraders.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: hazy-hub
                port:
                  number: 4000

Because the Hub runs a PostgreSQL database instance internally, it should be configured as a ReplicaSet with replicas: 1 to prevent multiple Pod instances attempting to write to the same database volume.

Configuring an Ingress Controller and associated TLS to the Hub service is beyond the scope of this document.

Data Volumes & Backup

The Hub requires two mounted volumes to function. One is mounted at /var/db and the other at /mnt/data

Internally the Hub container runs its own PostgreSQL 12 database instance and uses the continuous archiving feature provided by PostgreSQL to provide point-in-time recovery in the case of data-loss.

The volume mounted at /var/db holds the live database information, that is the WAL files and lock files used by the Hub's active PostgreSQL instance.

The volume mounted at /mnt/data holds static data. This static data comes in 4 flavours:

  1. Layers within any Synthesiser container images uploaded to the Hub
  2. Generator Models uploaded to the Hub (via the API or user interface)
  3. PostgreSQL base backups. Periodically the Hub runs PostgreSQL's pg_basebackup tool to create a snapshot of the database's state
  4. PostgreSQL WAL archives. The Hub's internal PostgreSQL instance is configured with WAL archving enabled. These archived WAL files are stored within the /mnt/data volume.

Backup

Of the two volumes described, only the one mounted at /mnt/data is suitable for either filesystem- or block-based backup.

To prevent data-loss the /mnt/data volume must be regularly backed up.

You should not backup the volume mounted at /var/db -- it holds the current, active database files and cannot be backed up and retain database integrity.

Data Recovery

In the event of loss of the host's volumes, the /mnt/data volume should be restored from backup and the Hub re-run. On initialisation the Hub will detect the presence of a valid database snapshot on the data volume and restore its state from that by booting its internal PostgreSQL instance in in "recovery" mode.

This will restore the Hub's state to that of the last backup.

Monitoring

The Hub exposes a simple health-check URL for integration with uptime monitoring, if your Hub is hosted at https://hub.organisation.com then the health status URL is https://hub.organisation.com/_health

$ curl  https://hub.organisation.com/_health

{"status":"OK"}

If the Hub is running and able to connect to its internal database, it will return a 200 status and the above JSON response.

Prometheus

If your organisation uses Prometheus for internal monitoring, then you can include statistics from the Hub by accessing the /metrics endpoint:

$ curl  https://hub.organisation.com/metrics

# TYPE telemetry_scrape_size_bytes summary
# HELP telemetry_scrape_size_bytes Scrape size, uncompressed
telemetry_scrape_size_bytes_count{registry="default",content_type="text/plain; version=0.0.4"} 21673
telemetry_scrape_size_bytes_sum{registry="default",content_type="text/plain; version=0.0.4"} 1541648071
# TYPE telemetry_scrape_duration_seconds summary
# HELP telemetry_scrape_duration_seconds Scrape duration
telemetry_scrape_duration_seconds_count{registry="default",content_type="text/plain; version=0.0.4"} 21673
telemetry_scrape_duration_seconds_sum{registry="default",content_type="text/plain; version=0.0.4"} 260.943337435

# etc...

This exposes some internal statistics for the Erlang VM.

TLS

The Hazy Hub provides unsecured HTTP access by default. In order to provide secure communication over TLS the customer must do one of two things:

In this configuration SSL connections are terminated externally using an application that's properly integrated into your internal Public Key Infrastructure. This application then acts as a reverse proxy and forwards all requests to the Hub.

Any HTTP or TCP proxy that supports SSL termination is suitable. For example, for on-premise installations:

Or within the cloud:

This is the recommended approach as it provides the most flexibility and tighter integration with the on-premise or cloud environment.

Configure the Hub with valid certificates

Alternatively you can provide the Hub container with valid certificates at runtime and allow the Hub to handle SSL termination.

To do this you must generate a valid keyfile and certificate using your internal PKI, and mount them into the Hub container as PEM encoded files.

By default the Hub expects the required files to be mounted into the container at the following locations:

  • /etc/hazy/hub.key.pem the SSL private key
  • /etc/hazy/hub.cert.pem the SSL certificate
  • /etc/hazy/hub.cacert.pem any associated intermediate certificates (optional) (intermediate certificates can also be included in the hub.cert.pem file)

The /etc/hazy directory is also the location of the Hub configuration file, so you can place these certificate files in the same directory, next to the hub.yaml file.

The key file and the certificate file are required for SSL support. If either the key file or the certificate file is missing, then SSL won't be enabled.

If provided with invalid files, in an incorrect format (not PEM encoded) or corrupted in some way, then the Hub may fail to start.

If you would like to mount the key and/or certificate files into an alternate location, you can set the following environment variables when running the container:

  • HUB_CERT_PEMFILE the path to the certificate (or certificate bundle) (default /etc/hazy/hub.cert.pem)
  • HUB_CACERT_PEMFILE the path to any intermediate certificates (default /etc/hazy/hub.cacert.pem)
  • HUB_KEY_PEMFILE the path to the private key file (default /etc/hazy/hub.key.pem)

If the required certificate and key files are present at runtime, the Hub will start an HTTPS listener on port 4443 (as well as the standard HTTP service on port 4000).

If the key- and certificate-files are provided, then the Hub container should expose a new port on the host that is mapped to the Hub's SSL server. The SSL server listens on port 4443 within the container, so a complete docker run command for an SSL-enabled Hub container will look like:

docker run \
  # bind to all interfaces host on port 80 for HTTP
  -p 0.0.0.0:80:4000 \
  # bind to all interfaces on port 443 for HTTPS
  -p 0.0.0.0:443:4443 \
  # Hub configuration directory including certificate- and key-files
  -v /etc/hazy:/etc/hazy \
  # ... other options as described above...
  --name hub \
  --restart always \
  --detach \
  -v hazy_hub_db:/var/db \
  -v hazy_hub_data:/mnt/data \
  --read-only \
  --tmpfs /tmp:rw,size=64g \
  --tmpfs /var/run:rw,exec \
  --security-opt=no-new-privileges \
  --cap-drop ALL \
  --cap-add CHOWN \
  --cap-add DAC_OVERRIDE \
  --cap-add FOWNER \
  --cap-add FSETID \
  --cap-add KILL \
  --cap-add SETGID \
  --cap-add SETUID \
  hazy-hub:latest

For a configuration without a reverse proxy, the container ports must be exposed on the public IP address of the host (or all interfaces as above).

See below for understanding the port configuration and alternatives.

Configuring the HTTPS hostname

If SSL is enabled, the Hub will ensure that any plain HTTP requests are redirected to the equivalent HTTPS endpoint. It also sets the required HTTP Strict Transport Security (HSTS) response headers to ensure that the browser will only attempt to connect to the Hub server over HTTPS in the future.

By default, the redirect will assume that the Hub's HTTPS service is running on port 443.

This means that if the Hub container has been configured to expose its HTTP server on port 4000 (the default) then a user visiting e.g. http://hub.local:4000 will be redirected to https://hub.local:443. This may result in an error since the Hub won't be listening on port 443.

For this reason, if you are enabling SSL termination on the Hub, we recommended running the Hub container with its ports mapped appropriately, so the Hub's HTTP server running on port 4000 within the container is mapped to port 80 on the host, and the Hubs internal HTTPS service on port 4443 is mapped to port 443 on the host e.g.

docker run \
  # expose the HTTP server on the protocol-default port of 80
  -p 80:4000 \
  # expose the HTTPS server on the protocol-default port of 443
  -p 443:4443 \
  # mount the configuration directory (containing SSL key and certificate) as read-only
  -v /root/hazy/etc:/etc/hazy:ro \
  # other run options
  hazy/hub:latest

This will ensure that any HTTP→HTTPS redirects from the Hub work automatically.

Alternatively, if exposing the Hub on ports 80 and 443 is not possible, you can fix the HTTP→HTTPS redirection by manually setting a hostname for the HTTPS service using the HUB_SSL_HOST environment variable:

docker run \
  # expose the HTTP server on port 4000
  -p 4000:4000 \
  # expose the HTTPS server on port 4443
  -p 4443:4443 \
  # mount the configuration directory (containing SSL key and certificate) as read-only
  -v /root/hazy/etc:/etc/hazy:ro \
  # set the HTTPS hostname so that redirects to the HTTPS endpoint work
  -e HUB_SSL_HOST="hub.local:4443" \
  # other run options
  hazy/hub:latest

Now a user visiting http://hub.local:4000 will be redirected to the correct HTTPS host and port at https://hub.local:4443.

Updating certificates

When the provided certificates expire (or preferably before), they must be replaced with updated versions and the Hub container restarted.

Expired certificates will cause problems with all access to the Hub UI and API until they are updated.