Setting up a tutorial SQL Server database

The following sets up a SQL Server (Azure SQL Edge) database instance for the Using a database for storage tutorial.

  1. Fetch the mcr.microsoft.com/azure-sql-edge image and start a Docker container:
docker run -itd --cap-add SYS_PTRACE -e 'ACCEPT_EULA=1' -e 'MSSQL_SA_PASSWORD=Hazy123!' -p 1433:1433 --name hazy-tutorial-mssql -d mcr.microsoft.com/azure-sql-edge  

Disclaimer: By using pulling the mcr.microsoft.com/azure-sql-edge image and providing ACCEPT_EULA=1 in the above command, the user agrees to the license terms for Azure SQL Edge which can be found on Docker Hub.

  1. The container takes approximately 2 minutes to set up. You can run the following command to check if the setup is complete:
docker logs -f hazy-tutorial-mssql | grep 'Recovery is complete. This is an informational message only. No user action is required.'

  1. Install Python packages required for seeding the database. You can store the following in a requirements.txt file and run pip install -r requirements.txt:
numpy==1.23.*
pandas==1.5.*
SQLAlchemy==2.0.*
SQLAlchemy-Utils==0.38.*
pyodbc==4.0.*
ibm-db-sa==0.4.*
Faker==15.0.*

  1. Seed the children dataset into the SQL Server instance via the following snippet:

Note: The following code requires unixodbc as well as the ODBC Driver 17 for SQL Server installed on your machine and correctly configured in obdcinst.ini.

import random

import pandas as pd
from faker.factory import Factory
from sqlalchemy.engine import URL, create_engine
from sqlalchemy.types import Float, SmallInteger, String

random.seed(0)


def build_children() -> pd.DataFrame:
    fake = Factory.create()
    fake.seed(0)

    res = []
    for _ in range(1000):
        age = random.randint(2, 10)
        res.append(
            {
                "first_name": fake.first_name(),
                "last_name": fake.last_name(),
                "age": age,
                "height": age * 6.5 + 76 + random.randint(-10, 10),
            }
        )

    return pd.DataFrame.from_records(res)


def upsert_dataset(df: pd.DataFrame, conn: URL):
    df.to_sql(
        name="children",
        con=create_engine(conn, echo=True),
        if_exists="append",
        index=True,
    )


def build_connection_string() -> str:
    server = "localhost"
    username = "SA"
    password = "Hazy123!"
    database = "Master"
    driver = "ODBC Driver 17 for SQL Server"
    return f"mssql+pyodbc://{username}:{password}@{server}/{database}?driver={driver}"


def main():
    conn = build_connection_string()
    df = build_children()
    upsert_dataset(df, conn)


if "__main__" == __name__:
    main()