S3 Tables Iceberg datasets
Datasets assigned to an S3 Tables location are stored as Apache Iceberg namespaces in an S3 Tables bucket. Use the s3tables config block on a dataset declaration to control how Depot manages the namespace lifecycle.
Full S3 Tables executor support (Snowflake, Athena) is being implemented. The namespace is provisioned by CDK, but storage operations (transactions, queries) will reject with UnsupportedOperationException until executor support lands.
Namespace management modes
| Mode | CDK config | Namespace resource | Rename guard |
|---|---|---|---|
MANAGED (default) | s3tables: {} or omit | CDK creates AWS::S3Tables::Namespace | Yes — SSM prevents renames |
MANAGED with name | s3tables: { namespaceName: 'my_ns' } | CDK creates AWS::S3Tables::Namespace | Yes — SSM prevents renames |
EXTERNAL | s3tables: { namespaceManagement: 'EXTERNAL', namespaceName: 'existing_ns' } | No resource (reference only) | No |
MANAGED mode (default)
Depot creates and manages the namespace. If you omit namespaceName, Depot derives it from the dataset alias by lowercasing and replacing non-alphanumeric characters with underscores (e.g. my.dataset → my_dataset).
new depot.Dataset(this, 'MyDataset', {
environment,
name: 'my.dataset',
location: myS3TablesLocation,
package: myPackage,
// s3tables omitted → MANAGED, name derived from alias
});
To pin a specific namespace name:
new depot.Dataset(this, 'MyDataset', {
environment,
name: 'my.dataset',
location: myS3TablesLocation,
package: myPackage,
s3tables: { namespaceName: 'pinned_namespace' },
});
Once a MANAGED namespace is deployed, Depot stores its name in SSM Parameter Store and blocks any attempt to rename it on subsequent deployments. If you need to point at a different pre-existing namespace, switch to EXTERNAL mode.
EXTERNAL mode
Use EXTERNAL when the namespace already exists — for example, a namespace retained from a previous RETAIN-policy deployment or one created outside Depot. CDK references the namespace without creating or deleting it, and no rename guard is installed.
new depot.Dataset(this, 'MyDataset', {
environment,
name: 'my.dataset',
location: myS3TablesLocation,
package: myPackage,
s3tables: {
namespaceManagement: 'EXTERNAL',
namespaceName: 'existing_namespace',
},
});
Snowflake Catalog Integration
When a Snowflake executor is enabled for the environment and an S3 Tables location is active, Depot
automatically creates a Snowflake
Catalog Integration
(CATALOG_SOURCE = ICEBERG_REST, CATALOG_API_TYPE = AWS_GLUE, ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS)
for each dataset namespace. This tells Snowflake where to discover Iceberg metadata and how to vend
credentials for S3 Tables access.
refreshIntervalSeconds
Controls how quickly Snowflake detects metadata changes committed by external engines (Spark, Athena). Changes written by Snowflake are always immediately visible to Snowflake readers via standard transaction semantics — the interval does not apply to them. Default: 60 seconds.
| Writer | Reader | Refresh behaviour |
|---|---|---|
| Snowflake | Snowflake (same session) | Immediate — standard transaction semantics |
| Snowflake | Snowflake (different session, writing) | Immediate — pre-write metadata fetch |
| Snowflake | Snowflake (different session, reading) | Immediate — standard transaction semantics |
| Snowflake | Spark / Athena | Immediate — Glue committed before Snowflake returns |
| Spark / Athena | Snowflake | Lag up to refreshIntervalSeconds |
| Spark / Athena | Spark / Athena | Immediate — both read Glue directly |
Source: authoritative Snowflake guidance captured at DPT-3505 Q&A page.
Inspecting and forcing refresh
- Inspect refresh state for a given table with
SELECT SYSTEM$AUTO_REFRESH_STATUS('db.schema.table');(returns JSON including the last successfully processed snapshot).SHOW ICEBERG TABLES;exposes anAUTO_REFRESH_STATUScolumn covering the same signal. - Force an immediate, synchronous refresh with
ALTER ICEBERG TABLE db.schema.table REFRESH;— table-scoped, bypasses the interval. There is no catalog-wide equivalent; iterate viaSHOW ICEBERG TABLESwhen a bulk refresh is needed.
To tune the interval:
new depot.Dataset(this, 'MyDataset', {
environment,
name: 'my.dataset',
location: myS3TablesLocation,
package: myPackage,
s3tables: {
snowflake: { refreshIntervalSeconds: 30 }, // default: 60
},
});
Shorter intervals reduce lag for Spark→Snowflake reads at the cost of more frequent Glue polling.
Values below 30 seconds are not recommended. See the
Snowflake CREATE CATALOG INTEGRATION reference
for the full parameter description.
Cost tagging
Depot automatically tags the namespace scope with both dataset tags (depot:dataset:id, depot:dataset:alias) and location tags (depot:location:id, depot:location:name, depot:location:type). These override the shared-bucket tags so cost attribution is accurate even when multiple locations share the same S3 Tables bucket.
Removal policy
The namespace inherits the dataset's removalPolicy. The default is RETAIN — the namespace is not deleted when the CDK stack is destroyed. Use DESTROY for non-production environments.
Troubleshooting
Cannot derive a valid S3 Tables Namespace name from alias "…"
Depot derives the namespace name from the dataset alias at CDK synthesis time. This error fires when the alias produces a name that violates AWS naming rules — for example, the sanitised result is empty or starts with the reserved prefix aws.
Fix: supply an explicit namespaceName in the s3tables block:
new depot.Dataset(this, 'MyDataset', {
environment,
name: 'aws.special-dataset', // alias would produce 'aws_special_dataset' — reserved
location: myS3TablesLocation,
package: myPackage,
s3tables: { namespaceName: 'special_dataset' },
});