Skip to main content

S3 Tables Iceberg datasets

Datasets assigned to an S3 Tables location are stored as Apache Iceberg namespaces in an S3 Tables bucket. Use the s3tables config block on a dataset declaration to control how Depot manages the namespace lifecycle.

Work in progress

Full S3 Tables executor support (Snowflake, Athena) is being implemented. The namespace is provisioned by CDK, but storage operations (transactions, queries) will reject with UnsupportedOperationException until executor support lands.

Namespace management modes

ModeCDK configNamespace resourceRename guard
MANAGED (default)s3tables: {} or omitCDK creates AWS::S3Tables::NamespaceYes — SSM prevents renames
MANAGED with names3tables: { namespaceName: 'my_ns' }CDK creates AWS::S3Tables::NamespaceYes — SSM prevents renames
EXTERNALs3tables: { namespaceManagement: 'EXTERNAL', namespaceName: 'existing_ns' }No resource (reference only)No

MANAGED mode (default)

Depot creates and manages the namespace. If you omit namespaceName, Depot derives it from the dataset alias by lowercasing and replacing non-alphanumeric characters with underscores (e.g. my.datasetmy_dataset).

new depot.Dataset(this, 'MyDataset', {
environment,
name: 'my.dataset',
location: myS3TablesLocation,
package: myPackage,
// s3tables omitted → MANAGED, name derived from alias
});

To pin a specific namespace name:

new depot.Dataset(this, 'MyDataset', {
environment,
name: 'my.dataset',
location: myS3TablesLocation,
package: myPackage,
s3tables: { namespaceName: 'pinned_namespace' },
});
Rename prevention

Once a MANAGED namespace is deployed, Depot stores its name in SSM Parameter Store and blocks any attempt to rename it on subsequent deployments. If you need to point at a different pre-existing namespace, switch to EXTERNAL mode.

EXTERNAL mode

Use EXTERNAL when the namespace already exists — for example, a namespace retained from a previous RETAIN-policy deployment or one created outside Depot. CDK references the namespace without creating or deleting it, and no rename guard is installed.

new depot.Dataset(this, 'MyDataset', {
environment,
name: 'my.dataset',
location: myS3TablesLocation,
package: myPackage,
s3tables: {
namespaceManagement: 'EXTERNAL',
namespaceName: 'existing_namespace',
},
});

Snowflake Catalog Integration

When a Snowflake executor is enabled for the environment and an S3 Tables location is active, Depot automatically creates a Snowflake Catalog Integration (CATALOG_SOURCE = ICEBERG_REST, CATALOG_API_TYPE = AWS_GLUE, ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS) for each dataset namespace. This tells Snowflake where to discover Iceberg metadata and how to vend credentials for S3 Tables access.

refreshIntervalSeconds

Controls how quickly Snowflake detects metadata changes committed by external engines (Spark, Athena). Changes written by Snowflake are always immediately visible to Snowflake readers via standard transaction semantics — the interval does not apply to them. Default: 60 seconds.

WriterReaderRefresh behaviour
SnowflakeSnowflake (same session)Immediate — standard transaction semantics
SnowflakeSnowflake (different session, writing)Immediate — pre-write metadata fetch
SnowflakeSnowflake (different session, reading)Immediate — standard transaction semantics
SnowflakeSpark / AthenaImmediate — Glue committed before Snowflake returns
Spark / AthenaSnowflakeLag up to refreshIntervalSeconds
Spark / AthenaSpark / AthenaImmediate — both read Glue directly

Source: authoritative Snowflake guidance captured at DPT-3505 Q&A page.

Inspecting and forcing refresh

  • Inspect refresh state for a given table with SELECT SYSTEM$AUTO_REFRESH_STATUS('db.schema.table'); (returns JSON including the last successfully processed snapshot). SHOW ICEBERG TABLES; exposes an AUTO_REFRESH_STATUS column covering the same signal.
  • Force an immediate, synchronous refresh with ALTER ICEBERG TABLE db.schema.table REFRESH; — table-scoped, bypasses the interval. There is no catalog-wide equivalent; iterate via SHOW ICEBERG TABLES when a bulk refresh is needed.

To tune the interval:

new depot.Dataset(this, 'MyDataset', {
environment,
name: 'my.dataset',
location: myS3TablesLocation,
package: myPackage,
s3tables: {
snowflake: { refreshIntervalSeconds: 30 }, // default: 60
},
});

Shorter intervals reduce lag for Spark→Snowflake reads at the cost of more frequent Glue polling. Values below 30 seconds are not recommended. See the Snowflake CREATE CATALOG INTEGRATION reference for the full parameter description.

Cost tagging

Depot automatically tags the namespace scope with both dataset tags (depot:dataset:id, depot:dataset:alias) and location tags (depot:location:id, depot:location:name, depot:location:type). These override the shared-bucket tags so cost attribution is accurate even when multiple locations share the same S3 Tables bucket.

Removal policy

The namespace inherits the dataset's removalPolicy. The default is RETAIN — the namespace is not deleted when the CDK stack is destroyed. Use DESTROY for non-production environments.

Troubleshooting

Cannot derive a valid S3 Tables Namespace name from alias "…"

Depot derives the namespace name from the dataset alias at CDK synthesis time. This error fires when the alias produces a name that violates AWS naming rules — for example, the sanitised result is empty or starts with the reserved prefix aws.

Fix: supply an explicit namespaceName in the s3tables block:

new depot.Dataset(this, 'MyDataset', {
environment,
name: 'aws.special-dataset', // alias would produce 'aws_special_dataset' — reserved
location: myS3TablesLocation,
package: myPackage,
s3tables: { namespaceName: 'special_dataset' },
});