Monday, April 24, 2017

Announcing SimsLoader

I'm happy to announce the official release of SimsLoader, a new tool for generating test data for relational databases. It's opensource, available as a Docker image, and works with MySQL, Postgres, Oracle, and SQL Server.

Unlike existing tools, SimsLoader doesn't assume it knows anything about your database. Instead, it reads your schema each time. This means:

  • You only tell it what you care about
  • It automagically handles schema changes
SimsLoader requires a minimum of information. You tell it what you want and it figures out everything else you need. Whether it's a NOT NULL column or a row in a parent table, SimsLoader will fill in everything necessary to give you what you asked for. And give it all back to you.

SimsLoader works by taking two configuration files - a model file and a specification file - in either YAML or JSON. The specification file contains what you want, specified with as much or as little detail as you want. At minimum, you can say:

users: 100

And 100 new rows in the users table are created with all the necessary columns filled in. If the users table has foreign keys, rows in those tables are created and on down the line until every row has everything it needs.

The specification file can be as complex as you want it to be. For example:

users:
  - name: John Doe
    started_on:
      type: timestamp_in_past_5_years
    organization:
      name: Acme Industries
  - name: Jane Doe
    started_on:
      type: timestamp_in_past_2_years
    organization:
      name: Acme Corporation
invoices:
  created:
    type: timestamp_in_past_2_years
  lineitems: 5

If you load this specification file, the returned value would be a YAML document with two users and one invoice. Even though you specify two new organizations and five lineitems, those are attributes of the three rows you actually asked for.

The model file adds information to SimsLoader's understanding of your database. It's primarily used to add type information to the various columns. For example:
users:
  columns:
    name: us_name
    address: us_address
    city: us_city
    state: us_state
    zipcode: us_zipcode
    started_on: timestamp_in_psat
  has_many:
    invoices:
      columns:
        - invoice_id
      foreign:
        source: invoices
        columns:
          - id

Now, whenever a row in the users table is generated, if a name isn't provided, it will be filled in with a reasonable-looking name from the US. If this hadn't been specified, the column would be filled with random characters (or potentially left blank, if it is a nullable column). In addition, you can specify missing foreign keys (has_many as in the example or the reverse belongs_to), missing unique constraints, and several other aspects.

No comments:

Post a Comment