Beyond automatic migrations

Sometimes Depot is unable to compute a migration using implicit or automatic migration descriptions.

In these case, it may be necessary to perform a three-pass operation:

First deploy a new version of the package with a hybrid between the old and the new schema
Run a "backfill" transaction to populate the "new" persona of the hybrid schema
Deploy a second new version of the package that drops the "old" persona of the hybrid schema and tightens up whatever needs tightening up.

An example is provided in the DepotTest unit test suite and described in more details here:

Objective

The test case migrates from this schema:

to this schema:

(in other words, we perform a normalization of the breed field into a separate entity)

Comparison of the source and target schemas

The source and target schemas, in Depot YAML notation look like this:

Source Target

Source	Target
`adv.Dog: type: object properties: age: type: integer! name: type: string breed: type: string!`	`adv.Dog: type: object properties: age: type: integer! name: type: string breed: type: adv.Breed! adv.Breed: type: object properties: name: type: string!`

adv.Dog:
  type: object
  properties:
    age:
      type: integer!
    name:
      type: string
    breed:
      type: string!

adv.Dog:
  type: object
  properties:
    age:
      type: integer!
    name:
      type: string
    breed:
      type: adv.Breed!

adv.Breed:
  type: object
  properties:
    name:
    type: string!

Migration strategy

The complete refactoring will use the following stategy:

First migration: source to hybrid model

In the source to hybrid model, we will deploy

the new adv.Breed object
a new field in adv.Dog that will receive the future reference to the adv.Breed object (initially optional)
a supporting Query that computes the contents of the adv.Breed table out of the old-style adv.Dog object
a supporting Query that computes the contents of the new field in adv.Dog out of the old-style adv.Dog object and the new adv.Breed object.

Source Hybrid

Source	Hybrid
`adv.Dog: type: object properties: age: type: integer! name: type: string breed: type: string!`	adv.Dog: type: object version: 2 properties: age: type: integer! name: type: string breed: type: string! newBreedId: type: string # note: optional for now adv.Breed: type: object properties: name: type: string! adv.migrate.Breed.v1: type: query description: \| a helper query to backfill the Breed table from breeds found in the Dog table properties: id: type: string! name: type: string! sql: \| WITH UNIQUE_BREED AS (SELECT DISTINCT breed AS name FROM DOG ) SELECT MD5(name) AS ID, name FROM UNIQUE_BREED WHERE name IS NOT NULL using: DOG: adv.Dog adv.migrate.Dog.v2: type: query description: \| a helper query to update the Dog table to use references to the Breed table properties: id: # this is the ID of the adv.Dog instance that needs to be updated type: string! newBreedId: type: string! sql: \| SELECT DOG.ID, BREED.ID AS NEW_BREED_ID FROM DOG LEFT JOIN BREED ON DOG.breed = BREED.name using: DOG: adv.Dog BREED: adv.Breed
	no explicit migration needed

adv.Dog:
  type: object
  properties:
    age:
      type: integer!
    name:
      type: string
    breed:
      type: string!

adv.Dog:
  type: object
  version: 2
  properties:
    age:
      type: integer!
    name:
      type: string
    breed:
      type: string!
    newBreedId:
      type: string # note: optional for now

adv.Breed:
  type: object
  properties:
    name:
    type: string!

adv.migrate.Breed.v1:
  type: query
  description: |
    a helper query to backfill the Breed
    table from breeds found in the Dog table
  properties:
    id: 
      type: string!
    name:
      type: string!  
  sql: |
      WITH UNIQUE_BREED AS (SELECT DISTINCT breed AS name FROM DOG ) 
      SELECT MD5(name) AS ID, name 
      FROM UNIQUE_BREED 
      WHERE name IS NOT NULL
  using:
    DOG: adv.Dog

adv.migrate.Dog.v2: 
  type: query
  description: |
    a helper query to update the Dog table to 
    use references to the Breed table
  properties:
    id: # this is the ID of the adv.Dog instance that needs to be updated
      type: string!    
    newBreedId:
      type: string!
  sql: |
    SELECT DOG.ID, BREED.ID AS NEW_BREED_ID 
      FROM DOG LEFT JOIN BREED 
        ON DOG.breed = BREED.name
  using:
    DOG: adv.Dog
    BREED: adv.Breed

no explicit migration needed

This is an automatic migration, that only adds news objects and fields.

After deploying the "hybrid model":

the adv.Breed table will exist and be empty
there will be an always-NULL field newBreedId in the adv.Dog table.
the two supporting queries have no existence in the database, but they are available in the Depot schema

tip

for debugging purposes, or to support dry runs, it is possible to deploy the supporting queries as views rather than queries. They will be present in the database until the next version of the schema is deployed.

Second operation: backfill

The second operation is issued as a transaction on Depot's step function API. It should look like this:

{
    "actions": [
        {
            "operation": "UPSERT",
            "source": "adv.migrate.Breed.v1",
            "target": "adv.Breed"
        },
        {
            "operation": "PATCH",
            "source": "adv.migrate.Dog.v2",
            "target": "adv.Dog"
        }
    ]
}

Once this transaction is executed:

the adv.Breed table will be populated with the distinct values of the breed field
the newBreedId field of the adv.Dog table will be populated with identifiers that point to the adv.Breed object with a #name that matches the adv.Dog#breed field.

Third operation: target model

Hybrid Target

Hybrid	Target
`adv.Dog: type: object version: 2 properties: age: type: integer! name: type: string breed: type: string! newBreedId: type: string # note: optional for now adv.Breed: type: object properties: name: type: string! adv.migrate.Breed.v1: type: query # ✂ adv.migrate.Dog.v2: type: query # ✂`	`adv.Dog: type: object version: 3 properties: age: type: integer! name: type: string breed: type: adv.Breed! adv.Breed: type: object properties: name: type: string!`
	`- target: adv.Dog fromVersion: 2 toVersion: 3 properties: breed: '{ "id": newBreedId }'`

adv.Dog:
  type: object
  version: 2
  properties:
    age:
      type: integer!
    name:
      type: string
    breed:
      type: string!
    newBreedId:
      type: string # note: optional for now

adv.Breed:
  type: object
  properties:
    name:
    type: string!

adv.migrate.Breed.v1:
  type: query
    # ✂  

adv.migrate.Dog.v2: 
  type: query
    # ✂  

adv.Dog:
  type: object
  version: 3
  properties:
    age:
      type: integer!
    name:
      type: string
    breed:
      type: adv.Breed!

adv.Breed:
  type: object
  properties:
    name:
    type: string!

- target: adv.Dog
  fromVersion: 2
  toVersion: 3
  properties:
    breed: '{ "id": newBreedId }'

Here we provide an explicit migration script, which will tell Depot to migrate the former newBreedId field into the new breed property, as a reference to the adv.Breed object.

We also cease to provide the support queries, which causes Depot to forget them.

After deployment:

the adv.Breed table will be unchanged, containing all unique breeds formerly found in the adv.Dog table
the adv.Dog table will contain the target model, with a reference to the adv.Breed associated with each adv.Dog.

caution

At this moment, Depot does not support having two sets of schemas versions and a migration transaction in the same deployment. This means that you should structure a schema change like the above in two separate deployments:

going to the "hybrid" schema, and applying (manually or automatically after the CloudFormation deployment is complete) the backfill operation
operation under the hybrid model until the next deployment is scheduled (e.g. next sprint)
second deployment, moving to the target model.

Objective​

Comparison of the source and target schemas​

Migration strategy​

First migration: source to hybrid model​

Second operation: backfill​

Third operation: target model​