Multi-dataset tests
By default, a DepotTest operates on a single namespace — one set of schemas deployed to one
location. This covers the vast majority of cases.
When your package declares schemas that span multiple datasets (e.g. a view in dataset petstore
that references object schemas in dataset source), you need to exercise those cross-dataset
references end-to-end. The multi-dataset entry point lets you do this within a single test.
Multi-dataset is an opt-in capability. The single-namespace form (DepotTest.in(namespace))
is unchanged and remains the recommended default for the common case.
The multi-dataset entry point
Instead of passing a single MergedSchemas namespace, pass an object whose keys are arbitrary
dataset identifiers and whose values are dataset specs:
import { DepotTest, TestLocationType } from "@stage-tech/depot-test";
await DepotTest.in({
source: {
namespace: sourceNamespace,
locationType: TestLocationType.SNOWFLAKE,
},
petstore: {
namespace: petstoreNamespace,
locationType: TestLocationType.SNOWFLAKE,
using: ["source"],
},
})
.setContent({ data: { "source.dogs": dogs, "source.cats": cats }, dataset: "source" })
.check({
data: [
{
scope: { schema: "petstore.AnimalCount", arguments: { minAgeArg: 3 }, sortFields: [{ field: "animal" }] },
values: [
{ animal: "dog", count: 2 },
{ animal: "cat", count: 1 },
],
},
],
dataset: "petstore",
})
.run();
Dataset spec fields
| Field | Type | Description |
|---|---|---|
namespace | MergedSchemas | The schemas that belong to this dataset. |
locationType | TestLocationType | Where to deploy this dataset. Defaults to the value of options({ locationType }) if set. |
using | string[] | Keys of other datasets this dataset depends on. Mirrors the using field in your PackageDataset declaration. |
using and cross-dataset references
The using array tells the backend which dependency datasets to make accessible when building
this dataset's views. If petstore.animal contains SQL referencing source.dogs and
source.cats, declaring using: ["source"] on the petstore spec ensures those schemas are
in scope when the view DDL is resolved.
Targeting operations at a dataset
With multiple datasets, each operation needs to know which dataset it targets. Specify dataset
(for read/write to the same dataset) or source / target for cross-dataset operations:
DepotTest.in({ source: { namespace: ... }, petstore: { namespace: ..., using: ["source"] } })
// Seed into source
.setContent({ data: { "source.dogs": dogs }, dataset: "source" })
// Check in petstore
.check({ data: [...], dataset: "petstore" })
// Cross-dataset transaction: read from source, write to petstore
.transaction({ source: "my.SourceSchema", target: "petstore.TargetSchema", sourceDataset: "source", targetDataset: "petstore" })
.run();
The dataset shorthand is equivalent to setting both sourceDataset and targetDataset to the
same value. A clear runtime error is thrown if an operation cannot resolve its dataset and no
default has been set.
Backend constraints
All datasets in a multi-dataset test must use the same backend family. Mixing backends
(e.g. SNOWFLAKE for one dataset and AURORA for another) is rejected with a descriptive
error at job submission time, before any infrastructure is provisioned.
SNOWFLAKE + AURORA in the same DepotTest.in(map) call is not supported and will be
rejected. Cross-location multi-dataset tests are a separate, deferred capability.
SNOWFLAKE and SNOWFLAKE_MOCK_ICEBERG datasets may be combined freely — both are
Snowflake-backed and differ only in naming strategy.
Complete example
The example below splits the catsAndDogs namespace (where petstore.animal is a view that
unions source.dogs and source.cats) into two logical datasets to exercise the cross-dataset
reference end-to-end.
import { DepotTest, TestLocationType } from "@stage-tech/depot-test";
const sourceNamespace: MergedSchemas = {
"source.dogs": catsAndDogs["source.dogs"],
"source.cats": catsAndDogs["source.cats"],
};
const petstoreNamespace: MergedSchemas = {
"petstore.animal": catsAndDogs["petstore.animal"],
"petstore.AnimalCount": catsAndDogs["petstore.AnimalCount"],
};
describe("multi-dataset cross-dataset view", () => {
it("petstore.AnimalCount resolves across datasets", async () => {
await DepotTest.in({
source: {
namespace: sourceNamespace,
locationType: TestLocationType.SNOWFLAKE,
},
petstore: {
namespace: petstoreNamespace,
locationType: TestLocationType.SNOWFLAKE,
using: ["source"],
},
})
.setContent({
data: {
"source.dogs": [
{ id: "1", name: "Charlie", age: 1, breed: "Akita" },
{ id: "2", name: "Jax", age: 4, breed: "Coonhound" },
{ id: "3", name: "Ginger", age: 5, breed: "Bulldog" },
],
"source.cats": [
{ id: "1", name: "Poppy", age: 1, breed: "British Shorthair" },
{ id: "2", name: "Luna", age: 2, breed: "Siamese" },
{ id: "3", name: "Daisy", age: 5, breed: "Ragdoll" },
],
},
dataset: "source",
})
.check({
data: [
{
scope: {
schema: "petstore.AnimalCount",
arguments: { minAgeArg: 3 },
sortFields: [{ field: "animal" }],
},
values: [
{ animal: "dog", count: 2 },
{ animal: "cat", count: 1 },
],
},
],
dataset: "petstore",
})
.run();
});
});
When using programmatic data input with multiple datasets, the data map in setContent can
contain schemas from any dataset — the backend routes each schema to its owning dataset based
on the namespace declarations. Only the dataset prop on the operation itself determines which
dataset's infrastructure handles the step.