Python SDK
The Python client library allows users to programmatically train and generate synthetic data. The minimum version of Python recommended for use is 3.8.
Structure¶
There are two main components of the SDK:
hazy_client2
(docs): Programmatic entrypoint to a running Hazy Hub instance,hazy_configurator
(docs): Library resources and objects for the configuration of Hazy pipeline.
Getting started¶
In this short example, we'll:
- Set up a Hazy TrainingConfig with a data schema, data locations and specific evaluation configs,
- Train the model via
train()
which outputs a Hazy model file (.hmf
extension). This file holds all the information to generate synthetic data, - Set up a Hazy GenerationConfig to configure synthetic data generation,
- Generate synthetic data via
generate()
.
Please refer to our SDK to learn about advanced configuration.
There are two primary methods to interact with Hazy:
- SynthAPI - Initiate training and generation jobs using the Hazy API, specifying the Hazy UI host URL and a personal API key for authentication.
- SynthDocker - Initiate training and generation tasks using the SynthDocker class, configuring parameters such as the working directory, local Docker daemon URL, container user settings, and file handling preferences, allowing the execution of synthesis operations within Docker containers.
Both methods are outlined below.
Training and generation via SynthAPI¶
Setup¶
Copy the below data into a file called children.csv
and save it locally or remotely. For more details on file storage options we support, refer to the data sources section of our user guide.
Create a new or use an existing project, and add a data source that points to the location of children.csv
.
first_name,last_name,age,height
Megan,Chang,8,131.0
Robert,Green,2,87.0
William,Sullivan,10,146.0
Kristen,Turner,8,127.0
Thomas,Silva,9,135.5
Rebecca,Wagner,5,114.5
Juan,Campos,4,101.0
Christine,King,4,95.0
Renee,Mcgrath,6,122.0
Lisa,Barrera,4,101.0
Kyle,Blair,3,87.5
Rachel,Sutton,7,126.5
Thomas,Garcia,10,134.0
Ryan,Carr,7,124.5
Robin,Levy,7,130.5
Thomas,Grimes,5,115.5
Jorge,Trujillo,9,138.5
Ana,Smith,10,139.0
Jennifer,Ross,2,96.0
Mallory,Barnett,2,81.0
Aaron,Snyder,8,138.0
Mikayla,Sanchez,2,98.0
Mark,Harrell,9,134.5
James,Bradley,5,108.5
John,Ponce,3,91.5
Linda,West,5,105.5
Christopher,Flores,4,109.0
William,Cantu,9,126.5
Daniel,Arnold,3,95.5
Jasmine,Kelley,10,146.0
Lisa,Fernandez,3,94.5
Tamara,Morrison,10,140.0
Briana,Wallace,3,102.5
Caitlyn,Cruz,7,128.5
Barbara,Roberts,5,117.5
Jaime,Lopez,10,149.0
Chloe,Douglas,6,119.0
Thomas,Davis,3,104.5
Katherine,Mcdowell,8,128.0
Sandra,Kirby,5,107.5
Rachael,Leblanc,4,98.0
Amber,Myers,4,93.0
Janet,Hill,6,120.0
Lisa,Atkinson,3,87.5
Patty,Lawrence,4,96.0
Stephanie,Riley,2,81.0
Shannon,Keller,10,143.0
Wendy,Stark,10,139.0
Laura,Miller,10,138.0
Chloe,Tucker,5,116.5
Crystal,Bruce,8,136.0
John,Dennis,6,119.0
Dave,Robinson,9,144.5
Laura,Cook,7,113.5
Lisa,Garcia,7,130.5
Dustin,Wolfe,3,100.5
Brandon,Berry,7,117.5
Renee,Ferguson,5,98.5
Erin,Johnson,6,108.0
Cynthia,Obrien,5,109.5
Barbara,Myers,4,102.0
Mitchell,Hooper,8,119.0
Benjamin,Smith,3,89.5
Susan,Lopez,5,99.5
David,Clark,10,150.0
Lauren,Giles,3,85.5
Andrew,Coleman,3,105.5
Craig,Green,5,117.5
Jeffrey,Lucas,3,97.5
Michael,White,3,96.5
William,Williams,3,86.5
Craig,Mcneil,2,85.0
James,Howard,4,95.0
Jessica,Massey,9,130.5
Samantha,Jackson,2,79.0
Emily,Levy,10,144.0
Brian,Lowe,3,93.5
Megan,Peterson,3,92.5
John,Carlson,3,105.5
Scott,Thompson,6,116.0
Thomas,Ortiz,8,123.0
Ashley,Romero,2,95.0
Larry,Howard,9,125.5
Mary,King,3,97.5
Ann,Smith,5,106.5
Judith,Rogers,7,126.5
Brandon,Campbell,4,98.0
John,Benton,2,84.0
Michael,Roberts,4,102.0
Michael,Arroyo,10,139.0
Cynthia,Oliver,3,104.5
Jennifer,Hughes,9,129.5
Robert,Curtis,2,94.0
Aaron,Lee,8,136.0
Matthew,Allen,10,140.0
Dana,Gray,7,123.5
Nancy,Carroll,6,109.0
Robert,Morales,10,131.0
Jacqueline,Barnes,9,126.5
Eileen,Williams,7,112.5
Sean,Green,10,139.0
Eric,Rose,4,99.0
Tony,Hoffman,9,135.5
Karla,Henson,6,116.0
Troy,Collins,4,101.0
Steven,Lamb,8,131.0
Nancy,Burnett,3,85.5
Jacob,Key,5,108.5
Cynthia,Miller,4,99.0
Jessica,Hatfield,5,118.5
Richard,Gregory,9,136.5
Leslie,Lewis,8,119.0
Jennifer,Smith,8,136.0
Mackenzie,Rice,8,119.0
Connor,Wilson,4,106.0
Debra,Russo,3,93.5
Joshua,Good,4,106.0
Craig,Nash,10,146.0
Randy,Miller,10,150.0
Joshua,Chavez,2,80.0
Laura,Callahan,9,134.5
Dennis,Meyer,6,119.0
Debra,Reed,2,92.0
Monica,Ramirez,5,115.5
Andrew,Williams,3,89.5
Erin,Grant,2,91.0
Stacey,Mays,8,128.0
Renee,Williams,2,85.0
Kara,Miles,2,79.0
Diana,Joseph,10,150.0
Raven,Bowman,3,91.5
Nathan,Medina,3,104.5
Jared,Matthews,5,107.5
Alan,Hernandez,6,110.0
Mathew,Clarke,3,100.5
Jennifer,Morgan,8,138.0
Christine,Williams,3,85.5
Frank,Holden,6,119.0
Keith,Foster,3,93.5
Amy,Carter,4,112.0
Timothy,Allen,10,151.0
Brandon,White,7,114.5
Alexandra,Jones,4,100.0
Richard,Murphy,2,80.0
Robert,Garcia,2,85.0
Regina,Wells,6,122.0
Mary,Cherry,7,122.5
Matthew,Mendoza,2,98.0
Holly,Simmons,9,144.5
Kevin,Navarro,9,144.5
Patricia,Gillespie,8,129.0
Courtney,Bennett,10,136.0
Terri,Fowler,5,110.5
Cameron,Miller,6,105.0
Kara,Brown,4,96.0
Alan,Long,6,115.0
April,West,7,122.5
Tracy,Richards,3,95.5
Erin,Henderson,2,80.0
Micheal,Hinton,6,110.0
Jose,Waters,4,110.0
Ryan,Howard,6,116.0
Caleb,Boyer,8,135.0
Jacqueline,Leach,4,101.0
Shannon,Rhodes,3,100.5
David,Sanders,5,99.5
Jared,Williams,6,110.0
Stacy,Lewis,10,133.0
Dustin,Gonzalez,6,117.0
Nicholas,Payne,7,120.5
Edward,Hinton,8,121.0
Tonya,Hernandez,3,102.5
Richard,Frazier,9,139.5
Natalie,Simpson,7,121.5
Sally,Morris,3,100.5
Vernon,Jimenez,3,100.5
Elizabeth,Harris,8,119.0
Chelsea,Robinson,6,115.0
Matthew,Estes,4,97.0
Rachel,Meyers,8,138.0
Austin,Hernandez,3,87.5
Jonathan,Mueller,3,91.5
Megan,Aguilar,5,99.5
Jennifer,Roman,8,118.0
Carl,Miller,3,97.5
Misty,Williams,10,147.0
Jeffrey,Williams,6,119.0
Alexis,Anthony,9,142.5
Mark,Martin,5,111.5
Eduardo,Douglas,3,96.5
Tanya,Wagner,5,106.5
Rachel,Shaw,4,105.0
Audrey,Gregory,5,109.5
Linda,Chang,3,87.5
Vicki,Burgess,2,95.0
Rebecca,Harris,9,130.5
Amanda,George,3,100.5
Margaret,Olson,8,126.0
Kylie,Price,5,118.5
Brenda,York,2,85.0
Lauren,Sandoval,4,95.0
Aaron,White,5,112.5
William,Scott,8,129.0
Cameron,Heath,10,135.0
Sherri,Turner,3,104.5
Ricky,Patrick,9,128.5
Bryan,Davidson,8,138.0
David,Mitchell,8,134.0
Maria,Brown,9,134.5
Barry,Butler,9,139.5
Travis,Boyer,5,115.5
Jennifer,Nunez,5,98.5
Edward,Hatfield,7,121.5
Robert,Carr,7,112.5
Paul,Williams,10,135.0
Thomas,Hernandez,6,124.0
Antonio,Williamson,4,104.0
Crystal,Garcia,6,120.0
Andrea,Reed,3,87.5
Patrick,Frank,10,132.0
Tracy,Ibarra,3,92.5
Chelsea,Mcdonald,4,93.0
Cynthia,Morgan,6,105.0
David,Fleming,9,134.5
Christy,Kramer,4,96.0
David,Buck,9,135.5
Lauren,Stark,10,143.0
Monique,Becker,10,147.0
Lisa,Stone,2,97.0
Kristen,Lopez,3,101.5
Kimberly,Wallace,3,98.5
Katherine,Gibson,5,107.5
Kristine,Jones,10,150.0
Bradley,Villa,8,133.0
Todd,Santana,8,137.0
Shirley,Estrada,5,98.5
Ashley,Robinson,2,84.0
Clayton,Weiss,6,121.0
Pamela,Chan,6,115.0
Holly,Fisher,3,100.5
Kevin,Wilson,6,114.0
Ronald,Knight,8,130.0
Sandra,Walls,8,119.0
Robert,Garcia,4,112.0
Kim,Navarro,4,99.0
Anthony,Griffin,6,115.0
Gina,Johnson,2,80.0
Samantha,Rivers,9,137.5
Jennifer,Miller,4,107.0
Chad,Howard,3,89.5
Anthony,Bailey,7,124.5
Alejandro,Mccann,2,98.0
Lori,Jones,9,136.5
Patricia,Clark,9,125.5
Jamie,Nunez,3,100.5
Shawna,Martinez,4,92.0
Adrian,Wood,2,98.0
Angel,Jacobs,4,112.0
Michele,Lopez,7,114.5
Daniel,Cooper,10,151.0
Susan,Anderson,7,117.5
Tammy,Cox,8,133.0
Thomas,Carter,3,86.5
Sharon,Rubio,9,143.5
Cynthia,White,7,131.5
Victoria,Garcia,3,104.5
Beverly,Moore,6,109.0
Rachael,Bautista,8,127.0
Linda,Stewart,3,101.5
John,Fischer,5,99.5
Kelly,Barnes,8,132.0
Brandon,Anderson,7,117.5
Andrew,Miller,9,135.5
Charles,Fisher,3,86.5
Andrea,Yang,2,94.0
Douglas,Henderson,6,105.0
Dana,Miller,10,149.0
Sean,Wood,5,105.5
Stacy,Brown,3,105.5
Ricky,Butler,10,147.0
Jessica,Flores,8,134.0
James,Carter,6,108.0
Thomas,Clements,4,105.0
Laura,Hill,8,120.0
Angela,Watts,3,98.5
Laura,Griffin,3,88.5
Raymond,Saunders,8,122.0
Ryan,Wright,2,93.0
Shawn,Giles,8,131.0
Douglas,Ford,2,94.0
Dana,Webb,7,119.5
James,Smith,3,96.5
Holly,Montgomery,3,88.5
Alan,Evans,7,111.5
Kayla,Fuller,7,122.5
Amy,Moore,4,92.0
Jasmine,Ruiz,5,109.5
Erika,Wolf,3,104.5
Jesse,Gill,4,98.0
Joshua,Riggs,2,85.0
David,Stephenson,3,85.5
Billy,Scott,6,116.0
Alicia,Perkins,2,98.0
Randy,Garcia,5,102.5
Johnny,Campbell,4,106.0
Karina,Stout,3,100.5
Caitlin,Johnson,7,119.5
Laura,Torres,4,92.0
Matthew,Moreno,5,109.5
John,Mora,7,126.5
Frank,Perry,6,114.0
Keith,Meyer,10,151.0
Audrey,Burton,7,116.5
Amanda,Jenkins,3,88.5
Cynthia,Powell,10,149.0
Kimberly,French,6,110.0
Kelly,Watson,8,122.0
Courtney,Moore,4,99.0
Heidi,James,7,127.5
Brittany,Taylor,5,105.5
Elizabeth,Gomez,4,101.0
Thomas,Perry,7,124.5
Thomas,Neal,2,83.0
Lucas,Pearson,2,91.0
Brian,Evans,3,87.5
Julie,Williams,4,105.0
Christine,Johnson,6,122.0
Thomas,Oneal,8,122.0
Alejandro,Rose,8,127.0
Carl,Camacho,7,113.5
Gina,Harmon,5,112.5
Elizabeth,Smith,7,131.5
Blake,Oliver,10,132.0
Yvonne,Marks,8,131.0
Holly,Acosta,2,92.0
Jeremy,Walton,7,125.5
Keith,Garcia,5,109.5
Steven,Rivera,6,120.0
Gary,Fisher,3,90.5
Phyllis,Graham,3,93.5
Seth,Fletcher,3,102.5
Alexandria,Anderson,4,106.0
Renee,Wallace,8,123.0
Kristina,Price,8,131.0
Lindsay,Price,4,99.0
Jeffrey,Gonzalez,9,134.5
Shelby,Willis,10,135.0
Brandon,Price,7,125.5
Jim,Miller,3,100.5
Jacob,Brown,5,107.5
Danielle,Thompson,2,93.0
Cheryl,Salazar,9,124.5
Janet,Hunt,5,107.5
Justin,Rich,3,105.5
Michael,Ellis,6,122.0
Crystal,Black,4,105.0
Stephanie,Blevins,9,126.5
Sarah,Villa,9,131.5
Bianca,Henry,10,143.0
Janet,Lewis,6,125.0
Joseph,Williams,2,82.0
Francisco,Smith,6,106.0
Diamond,Taylor,2,87.0
Kristin,Becker,8,134.0
Tara,Sanders,8,132.0
Sandra,Chavez,3,93.5
Matthew,Garcia,7,120.5
Nicole,Norton,5,117.5
Marcus,Bryant,3,86.5
Mark,Johnson,3,93.5
Bradley,Wood,6,122.0
Jason,Warren,7,114.5
Jacob,Harris,10,138.0
Emily,Fitzgerald,4,94.0
Larry,Heath,8,127.0
Jonathan,Cooper,6,121.0
Jennifer,Williams,4,110.0
Sarah,Jones,10,151.0
Steven,Hardy,5,115.5
Brandon,Lamb,3,98.5
Tiffany,Stevens,10,143.0
David,Miller,6,114.0
Corey,Cannon,9,135.5
Robert,Calhoun,4,97.0
Scott,Jones,3,88.5
Ronald,Fischer,8,130.0
Maria,Williams,9,128.5
Henry,Burns,10,140.0
David,Pitts,7,131.5
Sarah,Flores,9,137.5
Ryan,Hawkins,5,113.5
Justin,Weaver,9,140.5
James,Phillips,7,126.5
Scott,Jacobs,2,93.0
Amanda,Green,6,109.0
Jesse,Wilson,9,125.5
Kristen,Garcia,5,98.5
Jessica,Wright,7,126.5
Justin,Fitzgerald,8,118.0
Donna,Harmon,10,133.0
Zachary,Trevino,3,97.5
David,Brewer,2,90.0
David,Mclaughlin,2,82.0
Michael,Leonard,2,87.0
Jade,Guerrero,6,112.0
Jeffrey,Decker,4,110.0
John,Holmes,6,111.0
Emily,Hall,3,98.5
Courtney,Mitchell,9,134.5
George,Pacheco,8,123.0
Angela,Murphy,7,124.5
Brenda,Johnson,8,122.0
David,Foster,9,128.5
Theresa,Dixon,10,141.0
Charles,Hubbard,4,98.0
Kimberly,Hampton,4,106.0
Ashley,Dominguez,7,123.5
Jason,Oconnor,8,133.0
Lisa,Wolf,8,125.0
Michael,Garza,5,112.5
Cristina,Lester,5,116.5
Daniel,Flowers,2,91.0
Alicia,Howard,2,86.0
Stanley,Smith,3,90.5
Jeffrey,Delgado,7,112.5
Donna,Simpson,4,99.0
Stephanie,Castillo,6,124.0
Lee,Abbott,3,101.5
Jonathon,Munoz,6,116.0
Jade,Underwood,8,132.0
Kristen,Cruz,2,99.0
Brian,Johnson,10,151.0
Jessica,Nixon,10,144.0
Matthew,Alexander,9,139.5
Jessica,Barrett,6,120.0
Jordan,Powers,5,108.5
Kevin,Kelly,6,106.0
Amber,Oconnor,2,80.0
Edward,Johnson,4,103.0
Courtney,Johnson,2,88.0
Brady,Perez,2,83.0
Desiree,Jones,3,98.5
Bryan,Stanton,5,117.5
Roberto,Stafford,8,135.0
Jay,Graham,5,112.5
Michael,Thompson,5,108.5
Mark,Hatfield,3,104.5
Julie,Reyes,3,95.5
Robert,Baker,7,128.5
Amanda,Fitzgerald,9,134.5
Gloria,Ford,6,105.0
Bobby,Dorsey,10,132.0
Robert,Myers,5,109.5
Aaron,Taylor,3,91.5
Misty,Palmer,10,142.0
Jessica,Hernandez,5,104.5
Mark,Larsen,6,114.0
Lori,Wright,6,121.0
Andrew,Drake,8,126.0
Gary,French,9,135.5
Gabriela,Jackson,5,99.5
Christian,Pennington,6,122.0
Carla,Oliver,3,85.5
Stephen,Hernandez,9,139.5
Carolyn,King,9,125.5
Linda,Oconnor,8,133.0
Anthony,Randolph,9,138.5
Barry,Davis,3,87.5
Kelly,Hernandez,3,92.5
Joel,Kelly,3,89.5
Jill,Sullivan,8,124.0
Joseph,Vasquez,9,143.5
Philip,Collins,3,98.5
Rachel,Pierce,10,143.0
Tiffany,Mejia,2,84.0
Ashley,Baker,5,113.5
Gloria,Bryant,5,102.5
Jennifer,Powell,6,116.0
Tyler,Smith,7,124.5
Jody,Johnson,3,102.5
Erin,Perkins,6,124.0
Kimberly,Roberts,10,137.0
Gina,Clark,6,119.0
Allison,Peterson,10,150.0
Tiffany,Bonilla,9,141.5
Jason,Knight,6,113.0
Mark,Schmitt,5,98.5
Sandra,Willis,3,104.5
Jennifer,Maldonado,3,90.5
Michael,Hendricks,8,125.0
Rachel,Rivera,5,107.5
Julie,Smith,2,96.0
Lisa,Cunningham,10,144.0
Stephanie,Cox,2,82.0
Connie,Morris,8,138.0
Kirsten,Burke,6,108.0
Vanessa,Smith,7,118.5
Donald,Williams,10,140.0
David,Noble,5,105.5
Stacy,Castillo,3,101.5
Stacey,Cardenas,6,115.0
Kimberly,Burgess,5,109.5
Jacob,Dunn,9,133.5
Rodney,Dodson,4,96.0
Glenn,Jackson,2,96.0
Donna,Moore,10,141.0
Daniel,Gordon,7,129.5
Laura,Jacobs,2,83.0
Karen,Baker,8,122.0
Justin,Patterson,4,108.0
Vicki,Robbins,3,89.5
Sophia,Medina,5,113.5
Angela,Branch,5,105.5
James,Patton,4,99.0
Latasha,Kirk,8,129.0
Karen,Moore,4,112.0
Donna,Bradshaw,9,127.5
Anna,Ward,2,95.0
Stefanie,Hoffman,7,126.5
Robert,Mendez,9,133.5
Linda,Perez,2,86.0
Alfred,Rice,10,151.0
Shelly,Frazier,4,107.0
Crystal,Burton,9,141.5
Tyler,Simon,7,113.5
John,Snow,6,109.0
Joshua,Duffy,8,124.0
Sara,Miller,7,120.5
Shane,Manning,8,119.0
Shannon,Hicks,5,99.5
Lindsay,Bush,7,118.5
Susan,Martin,7,125.5
Sandra,Reilly,5,106.5
Cynthia,Shepard,7,116.5
Johnny,Macias,6,105.0
Kiara,Lynch,7,129.5
Christopher,Johnson,10,132.0
Alex,King,4,103.0
Kathryn,Hughes,2,94.0
Kimberly,Garrett,2,79.0
Paul,Beard,5,99.5
Benjamin,Marshall,2,86.0
Maria,Martinez,7,113.5
Jennifer,Murphy,2,90.0
Andrew,Wells,8,122.0
Aimee,Williams,5,112.5
Ashlee,Reed,8,122.0
Adam,Lee,7,120.5
Daniel,Fernandez,4,112.0
Bradley,Hebert,7,124.5
Megan,Landry,8,118.0
Leroy,Whitehead,8,126.0
Tracey,Hubbard,10,148.0
Joshua,Lambert,9,125.5
Caitlin,Powell,3,98.5
Kevin,Brown,8,123.0
David,Anderson,2,95.0
Timothy,May,4,111.0
Kathryn,Williams,10,135.0
Madison,Williams,3,95.5
Angela,Manning,5,103.5
Crystal,Herring,5,98.5
Melvin,Willis,4,109.0
Christopher,Castro,4,94.0
Lisa,Harris,8,137.0
Jose,Graves,3,104.5
Jon,Green,9,128.5
Selena,Lutz,2,87.0
Eric,Mccullough,7,123.5
Angela,Johnson,2,99.0
Brenda,May,2,94.0
Arthur,Todd,3,96.5
Lisa,Jones,6,109.0
Deborah,Bryant,9,131.5
John,Ellis,10,142.0
Erik,Cook,4,104.0
Diana,Alvarez,7,119.5
Angela,Stephens,9,136.5
Lori,Cooper,2,88.0
Lisa,Miller,10,140.0
Ronald,Gomez,10,146.0
Shannon,Bass,2,96.0
William,Archer,10,139.0
Michelle,Wilson,2,93.0
Logan,Johnson,8,121.0
Jessica,Smith,8,129.0
Norma,Lee,9,125.5
Robert,Brown,2,87.0
Larry,Price,2,87.0
Brett,Saunders,6,111.0
Jennifer,Howard,10,147.0
Mary,Jones,7,123.5
Brian,Kelly,6,111.0
Rachel,Avila,3,103.5
Emily,Hart,7,118.5
Linda,Gutierrez,10,142.0
Anthony,Lang,4,96.0
James,Hughes,7,111.5
Michael,Martinez,2,97.0
Michael,Thompson,4,103.0
Christina,Valdez,7,120.5
Alexander,Bryant,6,115.0
Angela,Savage,9,136.5
Tyler,Miller,8,123.0
Brett,Atkins,2,83.0
Krystal,Garrison,2,93.0
Jose,Wong,4,102.0
Cody,Serrano,2,94.0
Matthew,Friedman,6,124.0
Thomas,Johnson,5,100.5
Regina,Garrett,10,144.0
Justin,Johnson,6,110.0
Nicholas,Moore,10,136.0
Amy,Thomas,3,105.5
Greg,Mccall,4,110.0
Danielle,Sanchez,3,101.5
Natasha,Weber,10,150.0
Sonya,Webb,8,131.0
Pamela,Gregory,6,114.0
Bradley,Allen,6,105.0
Juan,Jackson,8,126.0
John,Perez,6,122.0
Natalie,Ford,10,148.0
Nancy,Taylor,7,121.5
Annette,Smith,5,111.5
Sarah,Smith,4,92.0
Peter,Solis,10,135.0
Zoe,Smith,8,129.0
Madison,Hicks,9,125.5
Benjamin,Waters,10,144.0
Daniel,Carr,5,98.5
Kimberly,Nunez,7,127.5
Noah,Johnson,4,98.0
Samuel,Mcintyre,7,131.5
Mason,Wright,9,124.5
Devin,Dixon,5,116.5
Amanda,Jones,5,106.5
Ricky,Hopkins,4,105.0
Tammy,Reynolds,3,103.5
Ryan,Ruiz,9,131.5
Jeffrey,Foster,9,140.5
Deanna,Sanders,3,91.5
Bonnie,Houston,4,106.0
Whitney,Dyer,3,98.5
Nathan,Johnson,8,126.0
Cheryl,Wells,6,118.0
Joel,Williams,7,130.5
Marc,Yates,7,113.5
Tamara,Rodriguez,6,105.0
Natalie,Williams,9,124.5
Jennifer,Johnson,6,111.0
Robert,Berg,8,130.0
Phillip,Middleton,8,138.0
Jennifer,Munoz,8,119.0
Evan,Peterson,9,135.5
Robert,Lawrence,4,110.0
Wendy,Campbell,6,115.0
Wesley,Mahoney,2,91.0
Michael,Fuller,9,140.5
Katie,Mccoy,4,93.0
Karen,Gonzalez,3,103.5
Susan,Thompson,7,122.5
Brandy,Phillips,2,81.0
Cynthia,Carter,5,101.5
Ryan,Fleming,10,146.0
Joseph,Luna,2,89.0
Michael,Anderson,2,89.0
Christie,Martin,8,122.0
Russell,Ross,6,118.0
Charles,Page,4,111.0
Erin,Strickland,4,104.0
Amy,Spencer,6,121.0
Gary,Clark,2,84.0
Jeremy,Fox,4,96.0
Lori,Kelly,9,144.5
Nicole,Fitzgerald,2,95.0
Jennifer,Rogers,2,96.0
David,Estes,8,123.0
Melissa,Ortiz,7,129.5
Michelle,Nolan,3,87.5
Matthew,Mason,10,136.0
Martin,Neal,6,111.0
Rhonda,Rollins,6,115.0
Julia,Torres,6,113.0
Nicole,Riddle,10,145.0
Michael,Fry,4,106.0
William,Oconnell,10,135.0
Wendy,Hess,2,99.0
Frances,Moore,4,112.0
Adam,Larson,10,132.0
Janet,Walls,7,113.5
Zachary,Terry,5,118.5
Deborah,Harris,9,143.5
Dawn,Holden,5,112.5
Daniel,Barker,10,136.0
Christina,Bennett,7,131.5
Laura,Smith,4,107.0
Patricia,Roth,10,132.0
Timothy,Rodriguez,10,133.0
Shawn,Silva,10,141.0
Jon,Tucker,2,81.0
Kimberly,Livingston,3,98.5
Anna,Wilcox,7,129.5
Christian,Gates,9,134.5
Samantha,Jackson,8,134.0
Maria,Atkinson,7,131.5
Natalie,Holmes,3,89.5
Charlene,Clark,7,111.5
Jean,Sullivan,4,96.0
Andrew,Taylor,2,89.0
Paul,Mcclure,5,99.5
Annette,Hendricks,8,138.0
Sarah,Miller,2,88.0
Brianna,Cook,8,119.0
William,Gibson,4,103.0
Timothy,Garcia,3,98.5
Marissa,Henry,2,93.0
Deanna,Kennedy,7,130.5
Herbert,Weaver,6,114.0
Erik,Phelps,9,137.5
Marie,Thomas,4,92.0
Casey,Jones,9,132.5
Phillip,Benton,5,110.5
Angela,Baker,3,96.5
Jerry,Rodriguez,3,88.5
Donald,Cain,2,90.0
Dillon,Shields,2,84.0
Mackenzie,Taylor,8,137.0
Angelica,Smith,2,89.0
Michelle,Grant,9,141.5
Karina,Henry,9,139.5
Hannah,Velazquez,3,86.5
Anita,Baxter,10,143.0
Matthew,Davis,6,105.0
Adam,Perez,10,134.0
Mary,Collins,3,95.5
Jeffrey,Simpson,7,114.5
Stacey,Hicks,9,125.5
Matthew,Jones,4,108.0
Ashley,Perez,6,106.0
Michael,James,2,91.0
Katherine,Hall,7,116.5
Sharon,Newton,10,135.0
Timothy,Gilmore,4,97.0
Michael,Cruz,4,112.0
David,Osborne,5,117.5
Richard,Mason,7,111.5
Thomas,Douglas,9,144.5
Nathaniel,Moon,8,119.0
Ryan,Howard,5,105.5
Richard,Miranda,6,115.0
Victoria,Delacruz,4,99.0
Timothy,Carter,7,118.5
Sean,Cooper,4,105.0
John,Lopez,9,135.5
Stephanie,Porter,4,104.0
Daniel,Reyes,2,84.0
Paul,Palmer,2,91.0
Tina,Bray,4,96.0
Ariel,Montgomery,2,79.0
Emma,Shaw,7,127.5
Tommy,Edwards,2,80.0
Gabriella,Davis,2,82.0
Logan,Macdonald,4,96.0
Jeremy,Ponce,8,118.0
Tracey,Johnston,8,131.0
Billy,Davis,7,118.5
James,Murphy,4,103.0
Daniel,Robles,10,137.0
Michael,Hall,10,143.0
Bradley,Reyes,3,89.5
Adam,Wilson,8,136.0
Cassandra,Donovan,7,114.5
James,Figueroa,8,131.0
James,Smith,5,113.5
Peggy,Mathis,8,125.0
Carl,Thompson,8,125.0
Jimmy,Hebert,9,136.5
Patrick,Aguirre,3,93.5
Ryan,James,6,121.0
Kathy,Gilbert,7,128.5
Joseph,Aguilar,2,98.0
Jane,Wilcox,9,131.5
Erica,Davidson,6,106.0
Caitlin,Davis,7,123.5
Sierra,Espinoza,3,102.5
Joshua,Schwartz,2,83.0
Christine,Hopkins,8,118.0
Nicholas,Berry,8,130.0
Donna,Harris,8,121.0
William,Martinez,9,143.5
Daniel,Bartlett,9,129.5
David,Barry,4,102.0
Terry,Gibson,9,137.5
Matthew,Butler,4,110.0
Nicholas,Cunningham,6,121.0
Matthew,Short,3,96.5
Jesus,Wright,7,115.5
Timothy,Mcconnell,7,126.5
Andrew,Cummings,10,132.0
Adam,Sullivan,5,106.5
Kathryn,Griffin,4,110.0
Ruben,King,7,120.5
Thomas,Perez,8,138.0
Justin,Valdez,2,88.0
Carol,Noble,10,144.0
Tanya,Boyd,2,92.0
Christopher,Evans,6,117.0
April,Nunez,5,109.5
Sophia,Moore,4,96.0
Benjamin,Berg,3,104.5
Linda,Cannon,7,116.5
Taylor,Byrd,2,92.0
Robert,Lee,8,132.0
Peter,Cobb,3,105.5
Laurie,Woodward,3,98.5
Pam,Fleming,10,148.0
Abigail,Ramsey,4,97.0
Isaac,Leblanc,4,98.0
Jennifer,Wheeler,4,99.0
Ryan,Schultz,2,95.0
Christine,Nolan,4,107.0
Ashley,Lewis,7,130.5
Scott,Sanchez,6,115.0
Rebecca,Bridges,3,98.5
Angela,Gonzalez,6,110.0
Bradley,Anderson,7,131.5
Scott,Craig,10,141.0
Jeffrey,Cohen,10,135.0
Michelle,Todd,8,135.0
Ronald,Brewer,6,112.0
Vanessa,Mcclure,8,129.0
Matthew,Jennings,8,133.0
Frank,Grant,10,140.0
Michael,Kim,8,131.0
Scott,Alvarado,3,89.5
Meghan,Mcdonald,4,92.0
Dwayne,Webster,10,134.0
Karen,Finley,5,117.5
Phillip,Marshall,10,134.0
Amber,Mcclain,6,124.0
Janet,Smith,4,104.0
Jared,Nash,3,86.5
Brenda,Russo,2,82.0
Robert,Clark,7,126.5
Adam,Ferrell,7,114.5
Derek,Day,9,135.5
Tracy,James,6,120.0
Amanda,Miller,5,103.5
Tara,Vasquez,10,148.0
Alexis,Barnes,3,101.5
Katherine,Collins,4,92.0
Julie,Saunders,4,108.0
John,Faulkner,8,137.0
Anthony,Boyer,5,112.5
William,Hernandez,8,126.0
Jonathan,Watkins,2,97.0
Joshua,Roberson,4,104.0
Jeffrey,Cruz,4,106.0
William,Morgan,2,91.0
Madison,Baker,3,105.5
Christopher,Moore,8,128.0
Courtney,Elliott,5,114.5
Bryan,Kaiser,9,125.5
Timothy,Flores,9,143.5
Jessica,Johnson,3,93.5
Ryan,Beck,10,146.0
Suzanne,Gill,10,151.0
Nicole,Wilson,8,133.0
Russell,Johnson,6,110.0
Joseph,Cruz,5,115.5
Maurice,Brooks,7,116.5
Leah,Lopez,6,124.0
Makayla,Weaver,4,106.0
Matthew,Tran,3,87.5
Evan,Simpson,9,136.5
Andrew,Roy,8,135.0
Martin,Cooper,3,93.5
Wanda,Austin,9,131.5
Robert,Gibbs,3,94.5
Carlos,Duncan,4,103.0
Abigail,Callahan,3,89.5
Brenda,Washington,2,83.0
Abigail,Casey,10,137.0
Michael,Grant,2,80.0
Jasmine,Cowan,8,135.0
Heather,Hayden,9,127.5
Raymond,Lynch,9,141.5
Julie,Bailey,7,121.5
Joseph,Kaufman,3,85.5
Michael,Cooke,5,105.5
Melissa,Robinson,9,133.5
Maria,Terrell,6,112.0
Joshua,Beck,2,94.0
Kayla,Miller,7,127.5
Lori,Parker,7,113.5
Jasmine,Clements,3,94.5
Austin,Carr,8,125.0
Kelly,Wilson,7,123.5
Jennifer,Werner,4,99.0
William,Reed,6,111.0
Shane,Perry,9,135.5
Amanda,Ortiz,6,117.0
Christopher,Krause,4,95.0
Wendy,Thompson,8,129.0
John,Kim,10,146.0
Holly,Johnson,5,118.5
Mary,Little,7,131.5
Joseph,Pitts,7,124.5
Donna,Stewart,6,116.0
Amy,Krause,8,127.0
Brandon,Erickson,3,100.5
Melissa,Schwartz,6,108.0
Donald,Harper,9,128.5
Kristi,Barnes,7,118.5
Brandi,King,4,102.0
Lawrence,Stokes,9,131.5
Tracy,Cole,3,97.5
David,Brooks,8,132.0
Abigail,Hall,10,145.0
Lindsey,Reyes,5,110.5
John,Parker,10,140.0
Madison,Strong,9,131.5
Susan,Smith,7,127.5
Karen,Fox,2,81.0
David,Wright,9,134.5
Juan,Powell,8,125.0
Thomas,Zuniga,8,119.0
Brian,Fletcher,2,92.0
Melissa,Howard,3,93.5
John,Walker,5,108.5
Kevin,Thomas,4,95.0
Andrea,Wright,4,103.0
Anthony,Hall,2,86.0
Lori,Butler,2,79.0
Nicole,Acevedo,8,135.0
John,Castro,2,83.0
Nicole,Perkins,3,104.5
Gary,Wright,5,100.5
Vanessa,Evans,9,130.5
Mindy,Norton,2,95.0
Judy,Bowen,8,120.0
Kristy,Boone,10,136.0
Bryan,Jackson,5,105.5
William,Lewis,8,130.0
Amanda,Snyder,9,124.5
David,Miller,8,124.0
Alexander,Bryan,8,119.0
Christopher,Nixon,6,105.0
Tonya,Reese,7,122.5
Shannon,Hill,7,125.5
Robert,Reed,4,111.0
Randy,Barber,10,133.0
Patricia,Moore,6,108.0
John,Clark,3,93.5
Brandon,Dickerson,2,83.0
William,Jones,4,104.0
Sean,Hayes,5,116.5
Kimberly,Juarez,7,117.5
Dennis,Sims,8,134.0
William,Smith,10,134.0
Dylan,Estrada,10,134.0
Michael,Stuart,9,127.5
Warren,Barker,10,145.0
Dennis,Mendoza,9,129.5
Jessica,Bass,9,141.5
James,Sanders,7,115.5
Thomas,Alexander,8,126.0
Thomas,Phillips,8,120.0
Lindsey,Gentry,10,141.0
Michael,Little,5,112.5
Oscar,Riley,5,109.5
Heather,Mason,9,137.5
Emily,Sherman,2,93.0
Megan,Lopez,2,96.0
Javier,Robertson,8,132.0
William,Morton,5,111.5
Zachary,Mccullough,5,106.5
Kimberly,Hunter,9,139.5
Margaret,Alvarez,4,99.0
Matthew,Hamilton,9,133.5
Stephen,Santos,7,131.5
Brett,Blair,9,143.5
Tammy,Ellis,4,108.0
Casey,Harris,3,91.5
Dakota,Scott,6,121.0
Laura,Smith,3,86.5
Morgan,Ayers,4,102.0
Jackson,Thompson,2,89.0
Tammy,Ward,4,106.0
Lisa,Smith,8,124.0
Vicki,Smith,8,133.0
Cathy,Hebert,5,103.5
Marie,Collins,8,119.0
Lynn,Long,7,127.5
Vincent,Cox,5,108.5
Michael,Perkins,4,108.0
Kimberly,Lyons,10,150.0
Kayla,Smith,8,122.0
Amanda,Gomez,9,142.5
Set up training via TrainingConfig¶
A Hazy TrainingConfig contains the following required parameters when used to train via SynthAPI:
- A DataSchema - defines the data and structure of the table(s).
- A list of SecretDataSource - data sources for input. This will include the data source which contains
children.csv
. - A list of DataLocationInput - data locations for input. PathReadTableConfig is used to locate
children.csv
, it requires the id of the data source as well as the relative path to the file inside the data source.
import hazy_configurator as hz
training_config = hz.TrainingConfig(
data_schema=hz.DataSchema(
tables=[
hz.TabularTable(
name="children",
dtypes=[
hz.CategoryType(
col="first_name",
),
hz.CategoryType(
col="last_name",
),
hz.FloatType(
col="height",
),
hz.IntType(
col="age",
),
],
),
],
),
data_input=[
hz.DataLocationInput(
name="children",
location=hz.PathReadTableConfig(
connection=INPUT_DATA_SOURCE_ID, rel_path="children.csv"
),
),
],
data_sources=[hz.SecretDataSource(id=INPUT_DATA_SOURCE_ID)],
)
Key Notes
- Hazy's DataSchema contains a list of DataTable. Each DataTable contains
dtypes
, defining the Hazy DataTypes for each column and aname
, linking the table's schema to its location. - Hazy's DataLocation currently support
.csv
,.csv.gz
,.parquet
and.avro
paths, SQL Server and IBM Db2 locations. - The
model_output
parameter is not needed to train via SynthAPI. Models will be saved to a pre-configured storage folder.
Note: Additional configuration (e.g. EvaluationConfig) used to tweak the evaluation of the trained model has been omitted for simplicity.
Please refer to our SDK for further TrainingConfig configuration options.
Training the model¶
The SynthAPI class serves as an interface to interact with Hazy services as an API. An object of this class can be instantiated using a Hazy UI host URL and an API authentication key. Please see here for more information about authentication with Keycloak setup.
Below, training takes a project id and a TrainingConfig object as input. It assumes a project with the provided id is already set up. This project should have the children.csv
data source attached. Please see SynthAPI for more information about how to train on a predefined configuration instead.
from hazy_client2 import SynthAPI
# Please see https://hazy.com/docs/python_sdk/tutorials/auth-synth-api-keycloak/
API_KEY = "YOUR_API_KEY"
HAZY_HUB_HOST = "https://your/hazy/hub"
PROJECT_ID = "YOUR_PROJECT_ID"
INPUT_DATA_SOURCE_ID = "YOUR_INPUT_DATA_SOURCE_ID"
client = SynthAPI(host=HAZY_HUB_HOST, api_key=API_KEY)
train_job = client.jobs.train(config=training_config, project_id=PROJECT_ID)
for state in client.jobs.poll_training_status(train_job.model_id, interval=5):
print(f"Job status {state}")
assert state.is_finished
Setting up generation via GenerationConfig¶
The GenerationConfig uses the .hmf
file to generate synthetic data at the desired output location.
The following are required generating via SynthAPI:
- A list of SecretDataSource - data sources for output.
- A list of DataLocationInput - data locations for output. PathWriteTableConfig is used to connect to a data source to output the generated data to, using the data source if and a relative path inside that data source.
OUTPUT_DATA_SOURCE_ID = "YOUR_OUTPUT_DATA_SOURCE_ID"
RELATIVE_OUTPUT_LOCATION = "output/children.csv"
generation_config = hz.GenerationConfig(
model="children.hmf",
data_output=[
hz.DataLocationOutput(
name="children",
location=hz.PathWriteTableConfig(
connection=OUTPUT_DATA_SOURCE_ID, rel_path=RELATIVE_OUTPUT_LOCATION
),
),
],
data_sources=[hz.SecretDataSource(id=OUTPUT_DATA_SOURCE_ID)],
)
Note: Additional fields (e.g. GenSampleParams used to configure the amount of synthetic data to be generated) have been omitted for simplicity. The default magnitude of generated synthetic data is 1.0 which means the same amount of data is generated as was trained on.
Please refer to our SDK for further GenerationConfig configuration options.
Generating synthetic data¶
Using the id of the model generated during training, we can generate synthetic data for the given GenerationConfig.
generate_job = client.jobs.generate(config=generation_config, model_id=model_id)
# poll the generation job until complete (every 5 seconds)
for state in client.jobs.poll_generation_status(generate_job.run_id, interval=5):
print(f"Job status {state}")
assert state.is_finished
Combining all this together¶
import hazy_configurator as hz
from hazy_client2 import SynthAPI
# Please see https://hazy.com/docs/python_sdk/tutorials/auth-synth-api-keycloak/
API_KEY = "YOUR_API_KEY"
HAZY_HUB_HOST = "https://your/hazy/hub"
PROJECT_ID = "YOUR_PROJECT_ID"
INPUT_DATA_SOURCE_ID = "YOUR_INPUT_DATA_SOURCE_ID"
OUTPUT_DATA_SOURCE_ID = "YOUR_OUTPUT_DATA_SOURCE_ID"
RELATIVE_OUTPUT_LOCATION = "output/children.csv"
client = SynthAPI(host=HAZY_HUB_HOST, api_key=API_KEY)
training_config = hz.TrainingConfig(
data_schema=hz.DataSchema(
tables=[
hz.TabularTable(
name="children",
dtypes=[
hz.CategoryType(
col="first_name",
),
hz.CategoryType(
col="last_name",
),
hz.FloatType(
col="height",
),
hz.IntType(
col="age",
),
],
),
],
),
data_input=[
hz.DataLocationInput(
name="children",
location=hz.PathReadTableConfig(
connection=INPUT_DATA_SOURCE_ID, rel_path="children.csv"
),
),
],
data_sources=[hz.SecretDataSource(id=INPUT_DATA_SOURCE_ID)],
)
generation_config = hz.GenerationConfig(
model="children.hmf",
data_output=[
hz.DataLocationOutput(
name="children",
location=hz.PathWriteTableConfig(
connection=OUTPUT_DATA_SOURCE_ID, rel_path=RELATIVE_OUTPUT_LOCATION
),
),
],
data_sources=[hz.SecretDataSource(id=OUTPUT_DATA_SOURCE_ID)],
)
train_job = client.jobs.train(config=training_config, project_id=PROJECT_ID)
for state in client.jobs.poll_training_status(train_job.model_id, interval=5):
print(f"Job status {state}")
assert state.is_finished
generate_job = client.jobs.generate(
config=generation_config, model_id=train_job.model_id
)
for state in client.jobs.poll_generation_status(generate_job.run_id, interval=5):
print(f"Job status {state}")
assert state.is_finished
Training and generation via SynthDocker¶
Set up training via TrainingConfig¶
A Hazy TrainingConfig contains the following required parameters:
- A DataSchema - defines the data and structure of the table(s).
- A list of DataLocationInput - data sources for input.
- A Hazy model file
model_output
path (.hmf
) - where the trained model will be saved.
import hazy_configurator as hz
training_config = hz.TrainingConfig(
model_output="children.hmf",
data_schema=hz.DataSchema(
tables=[
hz.TabularTable(
name="children",
dtypes=[
hz.CategoryType(
col="first_name",
),
hz.CategoryType(
col="last_name",
),
hz.FloatType(
col="height",
),
hz.IntType(
col="age",
),
],
),
],
),
data_input=[
hz.DataLocationInput(name="children", location="children.csv"),
],
)
Key Notes
- Hazy's DataSchema contains a list of DataTable. Each DataTable contains
dtypes
, defining the Hazy DataTypes for each column and aname
, linking the table's schema to its location. - Hazy's DataLocation currently support
.csv
,.csv.gz
,.parquet
and.avro
paths, SQL Server and IBM Db2 locations. - The
model_output
path where the Hazy model file (.hmf
) will be stored.
Note: Additional configuration (e.g. EvaluationConfig) used to tweak the evaluation of the trained model has been omitted for simplicity.
Please refer to our SDK for further TrainingConfig configuration options.
Training the model¶
The Hazy synthesiser SynthDocker class provides an abstraction for working with a local hazy Synthesiser.
An object of this class takes a TrainingConfig as input
and writes out a .hmf
model file, where specified in the training_config
.
import os
from os.path import exists
from hazy_client2 import SynthDocker
DOCKER_IMAGE = "docker_image:tag"
# replace these with specific IDs if required
DOCKER_USER_ID = os.getuid()
DOCKER_GROUP_ID = os.getgid()
synth = SynthDocker(
image=DOCKER_IMAGE,
container_user_default=f"{DOCKER_USER_ID}:{DOCKER_GROUP_ID}",
features_file="/path/to/your/features.json",
features_sig="/path/to/your/features.sig.json",
)
synth.train(cfg=training_config)
assert exists("children.hmf"), "Synthesiser should generate .hmf model file!"
Tip: Change the log level via export HAZY_LOGLEVEL=INFO
in order to see increase visibility in the logs. (Default: WARNING
).
Setting up generation via GenerationConfig¶
The GenerationConfig uses the .hmf
file to generate synthetic data at the desired output location.
generation_config = hz.GenerationConfig(
model="children.hmf",
data_output=[
hz.DataLocationOutput(
name="children",
location="/output/children.csv",
),
],
)
Note: Additional fields (e.g. GenSampleParams used to configure the amount of synthetic data to be generated) have been omitted for simplicity. The default magnitude of generated synthetic data is 1.0 which means the same amount of data is generated as was trained on.
Please refer to our SDK for further GenerationConfig configuration options.
Generating synthetic data¶
Using a Docker image of the Hazy synthesiser, we can generate synthetic data for the given GenerationConfig.
from os.path import exists
synth.generate(cfg=generation_config) # same synth as previous snippet
assert exists("output/children.csv"), "Synthesiser should generate synthetic data!"
Combining all this together¶
import os
from os.path import exists
import hazy_configurator as hz
from hazy_client2 import SynthDocker
DOCKER_IMAGE = "docker_image:tag"
DOCKER_USER_ID = os.getuid()
DOCKER_GROUP_ID = os.getgid()
synth = SynthDocker(
image=DOCKER_IMAGE,
container_user_default=f"{DOCKER_USER_ID}:{DOCKER_GROUP_ID}",
features_file="/path/to/your/features.json",
features_sig="/path/to/your/features.sig.json",
)
training_config = hz.TrainingConfig(
model_output="children.hmf",
data_schema=hz.DataSchema(
tables=[
hz.TabularTable(
name="children",
dtypes=[
hz.CategoryType(
col="first_name",
),
hz.CategoryType(
col="last_name",
),
hz.FloatType(
col="height",
),
hz.IntType(
col="age",
),
],
),
],
),
data_input=[
hz.DataLocationInput(
name="children",
location="children.csv",
),
],
)
generation_config = hz.GenerationConfig(
model="children.hmf",
data_output=[
hz.DataLocationOutput(
name="children",
location="output/children.csv",
),
],
)
synth.train(cfg=training_config)
assert exists("children.hmf"), "Synthesiser should generate .hmf model file!"
synth.generate(cfg=generation_config)
assert exists("output/children.csv"), "Synthesiser should generate synthetic data!"
Further Reading¶
The above provides a very simple guide for generating synthetic data via Hazy's SDK.
Advanced features include:
- HazyDataType for handling different types in a data schema.
- Custom Handlers for handling non-trivial behaviour.
- Evaluation configuration can be used to assess the efficacy of the model.
- hazy_client2 provides an SDK to manage containers.
Furthermore, our Complex Examples are provided for more detailed configuration requirements.