Beyond automatic migrations
Sometimes Depot is unable to compute a migration using implicit or automatic migration descriptions.
In these case, it may be necessary to perform a three-pass operation:
- First deploy a new version of the package with a hybrid between the old and the new schema
- Run a "backfill" transaction to populate the "new" persona of the hybrid schema
- Deploy a second new version of the package that drops the "old" persona of the hybrid schema and tightens up whatever needs tightening up.
An example is provided in the DepotTest unit test suite and described in more details here:
Objective
The test case migrates from this schema:
to this schema:
(in other words, we perform a normalization of the breed field into a separate entity)
Comparison of the source and target schemas
The source and target schemas, in Depot YAML notation look like this:
| Source | Target |
|---|---|
|
|
Migration strategy
The complete refactoring will use the following stategy:
First migration: source to hybrid model
In the source to hybrid model, we will deploy
- the new
adv.Breedobject - a new field in
adv.Dogthat will receive the future reference to theadv.Breedobject (initially optional) - a supporting
Querythat computes the contents of theadv.Breedtable out of the old-styleadv.Dogobject - a supporting
Querythat computes the contents of the new field inadv.Dogout of the old-styleadv.Dogobject and the newadv.Breedobject.
| Source | Hybrid |
|---|---|
|
|
| no explicit migration needed |
This is an automatic migration, that only adds news objects and fields.
After deploying the "hybrid model":
- the
adv.Breedtable will exist and be empty - there will be an always-NULL field
newBreedIdin theadv.Dogtable. - the two supporting queries have no existence in the database, but they are available in the Depot schema
for debugging purposes, or to support dry runs, it is possible to deploy the supporting queries as views rather
than queries. They will be present in the database until the next version of the schema is deployed.
Second operation: backfill
The second operation is issued as a transaction on Depot's step function API. It should look like this:
{
"actions": [
{
"operation": "UPSERT",
"source": "adv.migrate.Breed.v1",
"target": "adv.Breed"
},
{
"operation": "PATCH",
"source": "adv.migrate.Dog.v2",
"target": "adv.Dog"
}
]
}
Once this transaction is executed:
- the
adv.Breedtable will be populated with the distinct values of thebreedfield - the
newBreedIdfield of theadv.Dogtable will be populated with identifiers that point to theadv.Breedobject with a#namethat matches theadv.Dog#breedfield.
Third operation: target model
| Hybrid | Target |
|---|---|
|
|
|
Here we provide an explicit migration script, which will tell Depot to migrate the former newBreedId field into
the new breed property, as a reference to the adv.Breed object.
We also cease to provide the support queries, which causes Depot to forget them.
After deployment:
- the
adv.Breedtable will be unchanged, containing all unique breeds formerly found in theadv.Dogtable - the
adv.Dogtable will contain the target model, with a reference to theadv.Breedassociated with eachadv.Dog.
At this moment, Depot does not support having two sets of schemas versions and a migration transaction in the same deployment. This means that you should structure a schema change like the above in two separate deployments:
- going to the "hybrid" schema, and applying (manually or automatically after the CloudFormation deployment is complete) the backfill operation
- operation under the hybrid model until the next deployment is scheduled (e.g. next sprint)
- second deployment, moving to the target model.