Snowflake location

Overview

Snowflake storage supports:

Transactions
SQL views
Arbitrary queries and automatic query expansion using joins

The following configuration properties are supported:

warehouse - Name of the Snowflake Warehouse to use
credentialsSecretArn - The ARN of a secret containing the Snowflake connection username, password, and private key. Must be stored as JSON secret with 'username', 'password', and 'privatekey' fields.
accountId - Snowflake Account ID
region - Snowflake Region
roleArn - Snowflake Role ARN
createEntities - When true, Snowflake tables and views will be created automatically if they do not exist.
writable - When false, Snowflake will only allow writes through transaction api.
dataLocations - S3 locations for which to create Stages (specify without s3:// prefix)

When creating a AWS Secrets Manager JSON secret, you must set it up with three fields: username ,password and privatekey. These should contain your Snowflake credentials which Depot will use to connect to Snowflake. The Secret must be accessible to the Depot environment AWS account.

Snowflake entity creation

When you use a Depot Snowflake Location with a Dataset, under the hood, Depot will use a Custom CloudFormation Resource to talk to Snowflake and create various entities to support your Depot Dataset.

Database (a Snowflake database is created per-Dataset)
Snowflake database integration for s3 (per dataset)
Stages
- always creates a stage for default data bucket
- for each DataLocations in snowflake location config
Schemas (schemas matching your Depot Package(s) are created)
- Additional schemas created with suffix _HISTORY for any table that has history enabled(default)
Tables (tables that align with your Depot Package(s) are created)
- Additional tables created under schema with suffix _HISTORY for any table that has history enabled(default)
Views (views that align with your Depot Package(s) are created)
Materialized views
- that are formed from 1 target table are not created as part of a deployment
- that are formed from 2 or more tables are created but not populated as part of the deployment. To populate such materialized view you need to run a refresh transaction
UDFs
- JavaScript and SQL UDFs are created
UDTFs
- SQL UDTFs are created
Stored Procedures
A JSON File Format (per Database). The File Format is given the same name as the environment and sits in the PUBLIC schema.

For details on the naming conventions for the above entities, see the Snowflake Naming Convention documentation.

Creating Packages

The current Depot Snowflake Resource Provider will only create the entities describe above in the first instance that they do not already exist. (“CREATE IF NOT EXIST” style queries are run by the provider).

You may also notice when adding new schemas to an existing Package, or adding a new Package with removal of an old Package, that older entities persist in Snowflake. This happens because there is no current support for Update (on schemas) or Delete of other Snowflake entities.

Permissions

Don’t forget to grant permissions for your Snowflake database and warehouse to the role that your user belongs to. Permissions must be granted explicitly, even if you are able to query the data without them through the Snowflake UI. MFA should be disabled on the account used for access.

If you create a secondary (replica) Depot storage location using Snowflake, you will need to update your Snowflake dedicated IAM role (that Snowflake assumes) to allow read/write access to your Depot environment’s data bucket. If you are using the createEntities flag for your Snowflake Location and/or Datasets, then you can identify this IAM role by named convention:

sdp-{environment-id}-snowflake-db-integration-{dataset-id}

Replication

dynamodb -> snowflake replication is batched (small window) using DynamoDB streams and Snowflake Transactions

Step by step explanation:

data is written to dynamo through connector request
dynamoDb stream invokes sdp-<envId>-stream lambda which:
- writes history data to s3
event bridge rule scheduled based replicationWindow (default 60 min) starts rollup step function
rollup step function creates lock on dataset batch operations
checks previous rollup state and fails if previous run failed
calculates next delta
runs transaction
saves rollup state in dynamoDb
repeats until next delta matches upTo (or current hour by default)

Components involved in replication:

DynamoDB stream handler lambda named sdp-<envId>-stream
rollup delta lambda named sdp-<envId>-rollup-delta
EventBridge rule namedsdp-<envId>-<datasetId>-snowflake-rollup-schedule for scheduling the rollup step function
Snowflake rollup step function named sdp-<envId>-<datasetId>-snowflake-rollup
dataset state DynamoDB table sdp-<envId>-State
Transaction step function named sdp-<envId>-snowflake-transaction

Snowflake rollup

Overview​

Snowflake entity creation​

Permissions​

Replication​

Overview

Snowflake entity creation

Permissions

Replication