Skip to main content

Data pipelines

Some projects have a large volume of data from various datasets, some of that data can be unstructured, invalid or just not in the right form and processing large amount of items one by one not only takes time and API/database can struggle to handle the load if it's done item by item also where is might be a need to do it in a single transaction. That is where Depot data pipelines comes to help.

Transactions

A Depot Transaction issues invokes an outer "wrapper" Step Function which handles the lifecycle of the transaction. The Transaction is modelled as a Step Function Execution with a specific payload for the Transaction data.

The outer wrapper Step Function will invoke another internal Transaction Step Function based on what sort of Storage type is being used. (For example, Snowflake). The outer wrapper Step Function maintains and updates the state of the Transaction by updating a dedicated DynamoDB table with the running status as well as start/stop times.

Refer to Transactions for more information on specific Transaction types.

Transaction running statuses can be retrieved by using the list-transactions or get-transaction commands with the Depot CLI.

The relevant internal backend responsible for the Transaction type will execute the necessary steps to update the target Dataset(s) and once complete, the outer "wrapper" Transaction Step Function will complete.