Skip to main content

Multi-dataset tests

By default, a DepotTest operates on a single namespace — one set of schemas deployed to one location. This covers the vast majority of cases.

When your package declares schemas that span multiple datasets (e.g. a view in dataset petstore that references object schemas in dataset source), you need to exercise those cross-dataset references end-to-end. The multi-dataset entry point lets you do this within a single test.

info

Multi-dataset is an opt-in capability. The single-namespace form (DepotTest.in(namespace)) is unchanged and remains the recommended default for the common case.

The multi-dataset entry point

Instead of passing a single MergedSchemas namespace, pass an object whose keys are arbitrary dataset identifiers and whose values are dataset specs:

import { DepotTest, TestLocationType } from "@stage-tech/depot-test";

await DepotTest.in({
source: {
namespace: sourceNamespace,
locationType: TestLocationType.SNOWFLAKE,
},
petstore: {
namespace: petstoreNamespace,
locationType: TestLocationType.SNOWFLAKE,
using: ["source"],
},
})
.setContent({ data: { "source.dogs": dogs, "source.cats": cats }, dataset: "source" })
.check({
data: [
{
scope: { schema: "petstore.AnimalCount", arguments: { minAgeArg: 3 }, sortFields: [{ field: "animal" }] },
values: [
{ animal: "dog", count: 2 },
{ animal: "cat", count: 1 },
],
},
],
dataset: "petstore",
})
.run();

Dataset spec fields

FieldTypeDescription
namespaceMergedSchemasThe schemas that belong to this dataset.
locationTypeTestLocationTypeWhere to deploy this dataset. Defaults to the value of options({ locationType }) if set.
usingstring[]Keys of other datasets this dataset depends on. Mirrors the using field in your PackageDataset declaration.

using and cross-dataset references

The using array tells the backend which dependency datasets to make accessible when building this dataset's views. If petstore.animal contains SQL referencing source.dogs and source.cats, declaring using: ["source"] on the petstore spec ensures those schemas are in scope when the view DDL is resolved.

Targeting operations at a dataset

With multiple datasets, each operation needs to know which dataset it targets. Specify dataset (for read/write to the same dataset) or source / target for cross-dataset operations:

DepotTest.in({ source: { namespace: ... }, petstore: { namespace: ..., using: ["source"] } })
// Seed into source
.setContent({ data: { "source.dogs": dogs }, dataset: "source" })
// Check in petstore
.check({ data: [...], dataset: "petstore" })
// Cross-dataset transaction: read from source, write to petstore
.transaction({ source: "my.SourceSchema", target: "petstore.TargetSchema", sourceDataset: "source", targetDataset: "petstore" })
.run();

The dataset shorthand is equivalent to setting both sourceDataset and targetDataset to the same value. A clear runtime error is thrown if an operation cannot resolve its dataset and no default has been set.

Backend constraints

All datasets in a multi-dataset test must use the same backend family. Mixing backends (e.g. SNOWFLAKE for one dataset and AURORA for another) is rejected with a descriptive error at job submission time, before any infrastructure is provisioned.

caution

SNOWFLAKE + AURORA in the same DepotTest.in(map) call is not supported and will be rejected. Cross-location multi-dataset tests are a separate, deferred capability.

SNOWFLAKE and SNOWFLAKE_MOCK_ICEBERG datasets may be combined freely — both are Snowflake-backed and differ only in naming strategy.

Complete example

The example below splits the catsAndDogs namespace (where petstore.animal is a view that unions source.dogs and source.cats) into two logical datasets to exercise the cross-dataset reference end-to-end.

multi-dataset.test.ts
import { DepotTest, TestLocationType } from "@stage-tech/depot-test";

const sourceNamespace: MergedSchemas = {
"source.dogs": catsAndDogs["source.dogs"],
"source.cats": catsAndDogs["source.cats"],
};

const petstoreNamespace: MergedSchemas = {
"petstore.animal": catsAndDogs["petstore.animal"],
"petstore.AnimalCount": catsAndDogs["petstore.AnimalCount"],
};

describe("multi-dataset cross-dataset view", () => {
it("petstore.AnimalCount resolves across datasets", async () => {
await DepotTest.in({
source: {
namespace: sourceNamespace,
locationType: TestLocationType.SNOWFLAKE,
},
petstore: {
namespace: petstoreNamespace,
locationType: TestLocationType.SNOWFLAKE,
using: ["source"],
},
})
.setContent({
data: {
"source.dogs": [
{ id: "1", name: "Charlie", age: 1, breed: "Akita" },
{ id: "2", name: "Jax", age: 4, breed: "Coonhound" },
{ id: "3", name: "Ginger", age: 5, breed: "Bulldog" },
],
"source.cats": [
{ id: "1", name: "Poppy", age: 1, breed: "British Shorthair" },
{ id: "2", name: "Luna", age: 2, breed: "Siamese" },
{ id: "3", name: "Daisy", age: 5, breed: "Ragdoll" },
],
},
dataset: "source",
})
.check({
data: [
{
scope: {
schema: "petstore.AnimalCount",
arguments: { minAgeArg: 3 },
sortFields: [{ field: "animal" }],
},
values: [
{ animal: "dog", count: 2 },
{ animal: "cat", count: 1 },
],
},
],
dataset: "petstore",
})
.run();
});
});
tip

When using programmatic data input with multiple datasets, the data map in setContent can contain schemas from any dataset — the backend routes each schema to its owning dataset based on the namespace declarations. Only the dataset prop on the operation itself determines which dataset's infrastructure handles the step.