Setting up a tutorial Db2 database

The following sets up an IBM Db2 Community Edition database instance for the Using a database for storage tutorial.

  1. Fetch the ibmcom/db2 image and start a Docker container:
docker run -itd --name hazy-tutorial-db2 \
    -e LICENSE=accept -e DB2INST1_PASSWORD=Hazy123! -e DBNAME=Master \
    -e ARCHIVE_LOGS=false -e AUTOCONFIG=false \
    --privileged=true \
    -p 50000:50000 \
    ibmcom/db2

Disclaimer: By using pulling the ibmcom/db2 image and providing LICENSE=accept in the above command, the user agrees to the license terms for IBM Db2 Community Edition (Container Format).

  1. The container takes approximately 15 minutes to set up. You can run the following command to check if the setup is complete. When finished, you should see a (*) Setup has completed message in your terminal.
docker logs -f hazy-tutorial-db2 | grep '(*) Setup has completed.'

  1. Install Python packages required for seeding the database. You can store the following in a requirements.txt file and run pip install -r requirements.txt:
numpy==1.23.*
pandas==1.5.*
SQLAlchemy==2.0.*
SQLAlchemy-Utils==0.38.*
pyodbc==4.0.*
ibm-db-sa==0.4.*
Faker==15.0.*

  1. Seed the children dataset into the Db2 instance via the following snippet:
import random

import pandas as pd
from faker.factory import Factory
from sqlalchemy.engine import URL, create_engine
from sqlalchemy.types import Float, SmallInteger, String

random.seed(0)


def build_children() -> pd.DataFrame:
    fake = Factory.create()
    fake.seed(0)

    res = []
    for _ in range(1000):
        age = random.randint(2, 10)
        res.append(
            {
                "first_name": fake.first_name(),
                "last_name": fake.last_name(),
                "age": age,
                "height": age * 6.5 + 76 + random.randint(-10, 10),
            }
        )

    return pd.DataFrame.from_records(res)


def upsert_dataset(df: pd.DataFrame, conn: URL):
    df.to_sql(
        name="children",
        con=create_engine(conn, echo=True),
        if_exists="append",
        index=True,
        dtype={
            "first_name": String(length=255),
            "last_name": String(length=255),
            "age": SmallInteger(),
            "height": Float(),
        },
    )


def build_connection_string() -> str:
    return URL.create(
        drivername="db2",
        database="Master",
        host="localhost",
        port=50000,
        username="db2inst1",
        password="Hazy123!",
    )


def main():
    conn = build_connection_string()
    df = build_children()
    upsert_dataset(df, conn)


if "__main__" == __name__:
    main()