Skip to main content

Create a Dataset

A Dataset is where data is stored in pre-defined schemas. They can, for example, contain the data for an application, replicated data from another system, or a set of materialised views that transform data from other datasets as part of a pipeline. A dataset is defined by a package (it’s structure) and a store (it’s versioned storage location).

Create your first Dataset

Assuming you have already created an Environment referenced as exampleEnvironment and have a Package and a Location, you can create a Dataset using the Dataset Construct:

stack.ts
// Update imports to include Dataset
import { Environment, DynamoDBLocationCapacityMode, Location, Package, Dataset } from "@stage-tech/depot-cdk";

const sourceDataset = new Dataset(this, "Source", {
environment: exampleEnvironment,
package: sourcePackage,
name: "source",
location: ddbLocation
});

Using Datasets in a Pipeline

A pipeline is a two or more dependent datasets that transform data at each step. Pipelines are logical concepts in Depot, and are not defined explictly. You create pipelines by referencing dependent datasets in other datasets.

stack.ts
const sourceDataset = new Dataset(this, "Source", {
environment: exampleEnvironment,
package: sourcePackage,
name: "source",
location: ddbLocation
});

const transformDataset = new Dataset(this, "Transform1", {
environment: exampleEnvironment,
package: crdPackage,
name: "transform1",
location: ddbLocation,
dependencies: [{dataset: sourceDataset}]
});

const analyticsDataset = new Dataset(this, "Analytics", {
environment: exampleEnvironment,
package: analyticsPackage,
name: "analytics",
location: ddbLocation,
dependencies: [{dataset: transformDataset}]
});

Lookup Datasets across CDK Stacks

Datasets can be referenced across stacks using the lookup convenience functions.

const depotEnvironment = new Environment(this, "Environment", {
account: { name: "test", id: "123456789012" },
name: "test"
});

const datasetFromName = Dataset.fromDatasetName(this, "DatasetFromNameLookup", {
environment: depotEnvironment,
name: "some-dataset-name"
});

These looked-up datasets can then be referenced in newly created stack Dataset dependency chains.