Dataset-seeding
Depot support seeding tables and materialized views. This is achieved by adding seeding properties to the dataset i.e.
seeding: {
s3Location: "s3://<BucketName>/<folder>/"
}
Folder should contain json files with qualified schema names as file name i.e. pet.Store.json. File content must fallow https://jsonlines.org/ convention.
{"id": "77842e2a24c3453cc3f8da5bf61a1eb0","name": "puffy"}
{"id": "07ab2907e2e6b580b122d4c7dfa25c87","name": "fluffy"}
All datasets deployed with seeding properties will create a step function named sdp-<envId>-<datasetId>-dataset-seed.
The seeding function name (and therefore ARN) is changing. To avoid breaking changes we are creating two seeding functions when the location is Snowflake: one with the new naming above and a deprecated one with the old name. If you use the old naming pattern, please switch as soon as you can. The ARN of the correct seeding step function is now exposed on the dataset property dataset.seedingStepFunctionArn.
The Step function takes two parameters:
{
"ignoreState": false,
"seedOnly": []
}
ignoreState allows to seed all the files by passing true otherwise it will only pick files that was updated after last seeding.
seedOnly allows to filter files that you want to seed. Provide file names as string in the array.
Triggering step function and monitoring has not been part of the functionality.
It is now possible to configure the Dataset to trigger the seeding step function automatically as part of the deployment and migration process. See Dataset Migrations for more information.
This step function is for basic scenarios, if you need more advanced version of you can use transaction action directly i.e.
{
"locationUri": "s3://ws-depot-test-snowflake-e2e-test-data-654409816600/seeding/petstore.Store.json",
"operation": "SEED",
"target": "petstore.Store",
"pathFormat": "EXPLICIT",
"skipExpressions": true,
"format": "JSON"
}