Skip to main content

Beyond automatic migrations

Sometimes Depot is unable to compute a migration using implicit or automatic migration descriptions.

In these case, it may be necessary to perform a three-pass operation:

  1. First deploy a new version of the package with a hybrid between the old and the new schema
  2. Run a "backfill" transaction to populate the "new" persona of the hybrid schema
  3. Deploy a second new version of the package that drops the "old" persona of the hybrid schema and tightens up whatever needs tightening up.

An example is provided in the DepotTest unit test suite and described in more details here:

Objective

The test case migrates from this schema:

to this schema:

(in other words, we perform a normalization of the breed field into a separate entity)

Comparison of the source and target schemas

The source and target schemas, in Depot YAML notation look like this:

SourceTarget
adv.Dog:
type: object
properties:
age:
type: integer!
name:
type: string
breed:
type: string!
adv.Dog:
type: object
properties:
age:
type: integer!
name:
type: string
breed:
type: adv.Breed!

adv.Breed:
type: object
properties:
name:
type: string!

Migration strategy

The complete refactoring will use the following stategy:

First migration: source to hybrid model

In the source to hybrid model, we will deploy

  1. the new adv.Breed object
  2. a new field in adv.Dog that will receive the future reference to the adv.Breed object (initially optional)
  3. a supporting Query that computes the contents of the adv.Breed table out of the old-style adv.Dog object
  4. a supporting Query that computes the contents of the new field in adv.Dog out of the old-style adv.Dog object and the new adv.Breed object.
SourceHybrid
adv.Dog:
type: object
properties:
age:
type: integer!
name:
type: string
breed:
type: string!
adv.Dog:
type: object
version: 2
properties:
age:
type: integer!
name:
type: string
breed:
type: string!
newBreedId:
type: string # note: optional for now

adv.Breed:
type: object
properties:
name:
type: string!

adv.migrate.Breed.v1:
type: query
description: |
a helper query to backfill the Breed
table from breeds found in the Dog table
properties:
id:
type: string!
name:
type: string!
sql: |
WITH UNIQUE_BREED AS (SELECT DISTINCT breed AS name FROM DOG )
SELECT MD5(name) AS ID, name
FROM UNIQUE_BREED
WHERE name IS NOT NULL
using:
DOG: adv.Dog

adv.migrate.Dog.v2:
type: query
description: |
a helper query to update the Dog table to
use references to the Breed table
properties:
id: # this is the ID of the adv.Dog instance that needs to be updated
type: string!
newBreedId:
type: string!
sql: |
SELECT DOG.ID, BREED.ID AS NEW_BREED_ID
FROM DOG LEFT JOIN BREED
ON DOG.breed = BREED.name
using:
DOG: adv.Dog
BREED: adv.Breed
no explicit migration needed

This is an automatic migration, that only adds news objects and fields.

After deploying the "hybrid model":

  1. the adv.Breed table will exist and be empty
  2. there will be an always-NULL field newBreedId in the adv.Dog table.
  3. the two supporting queries have no existence in the database, but they are available in the Depot schema
tip

for debugging purposes, or to support dry runs, it is possible to deploy the supporting queries as views rather than queries. They will be present in the database until the next version of the schema is deployed.

Second operation: backfill

The second operation is issued as a transaction on Depot's step function API. It should look like this:

{
"actions": [
{
"operation": "UPSERT",
"source": "adv.migrate.Breed.v1",
"target": "adv.Breed"
},
{
"operation": "PATCH",
"source": "adv.migrate.Dog.v2",
"target": "adv.Dog"
}
]
}

Once this transaction is executed:

  • the adv.Breed table will be populated with the distinct values of the breed field
  • the newBreedId field of the adv.Dog table will be populated with identifiers that point to the adv.Breed object with a #name that matches the adv.Dog#breed field.

Third operation: target model

HybridTarget
adv.Dog:
type: object
version: 2
properties:
age:
type: integer!
name:
type: string
breed:
type: string!
newBreedId:
type: string # note: optional for now

adv.Breed:
type: object
properties:
name:
type: string!

adv.migrate.Breed.v1:
type: query
# ✂

adv.migrate.Dog.v2:
type: query
# ✂
adv.Dog:
type: object
version: 3
properties:
age:
type: integer!
name:
type: string
breed:
type: adv.Breed!

adv.Breed:
type: object
properties:
name:
type: string!
- target: adv.Dog
fromVersion: 2
toVersion: 3
properties:
breed: '{ "id": newBreedId }'

Here we provide an explicit migration script, which will tell Depot to migrate the former newBreedId field into the new breed property, as a reference to the adv.Breed object.

We also cease to provide the support queries, which causes Depot to forget them.

After deployment:

  • the adv.Breed table will be unchanged, containing all unique breeds formerly found in the adv.Dog table
  • the adv.Dog table will contain the target model, with a reference to the adv.Breed associated with each adv.Dog.
caution

At this moment, Depot does not support having two sets of schemas versions and a migration transaction in the same deployment. This means that you should structure a schema change like the above in two separate deployments:

  1. going to the "hybrid" schema, and applying (manually or automatically after the CloudFormation deployment is complete) the backfill operation
  2. operation under the hybrid model until the next deployment is scheduled (e.g. next sprint)
  3. second deployment, moving to the target model.