Skip to main content

Snowflake location

Overview

Snowflake storage supports:

  • Transactions
  • SQL views
  • Arbitrary queries and automatic query expansion using joins

The following configuration properties are supported:

  • warehouse - Name of the Snowflake Warehouse to use
  • credentialsSecretArn - The ARN of a secret containing the Snowflake connection username, password, and private key. Must be stored as JSON secret with 'username', 'password', and 'privatekey' fields.
  • accountId - Snowflake Account ID
  • region - Snowflake Region
  • roleArn - Snowflake Role ARN
  • createEntities - When true, Snowflake tables and views will be created automatically if they do not exist.
  • writable - When false, Snowflake will only allow writes through transaction api.
  • dataLocations - S3 locations for which to create Stages (specify without s3:// prefix)

When creating a AWS Secrets Manager JSON secret, you must set it up with three fields: username ,password and privatekey. These should contain your Snowflake credentials which Depot will use to connect to Snowflake. The Secret must be accessible to the Depot environment AWS account.

Snowflake entity creation

When you use a Depot Snowflake Location with a Dataset, under the hood, Depot will use a Custom CloudFormation Resource to talk to Snowflake and create various entities to support your Depot Dataset.

  • Database (a Snowflake database is created per-Dataset)
  • Snowflake database integration for s3 (per dataset)
  • Stages
    • always creates a stage for default data bucket
    • for each DataLocations in snowflake location config
  • Schemas (schemas matching your Depot Package(s) are created)
    • Additional schemas created with suffix _HISTORY for any table that has history enabled(default)
  • Tables (tables that align with your Depot Package(s) are created)
    • Additional tables created under schema with suffix _HISTORY for any table that has history enabled(default)
  • Views (views that align with your Depot Package(s) are created)
  • Materialized views
    • that are formed from 1 target table are not created as part of a deployment
    • that are formed from 2 or more tables are created but not populated as part of the deployment. To populate such materialized view you need to run a refresh transaction
  • UDFs
    • JavaScript and SQL UDFs are created
  • UDTFs
    • SQL UDTFs are created
  • Stored Procedures
  • A JSON File Format (per Database). The File Format is given the same name as the environment and sits in the PUBLIC schema.

For details on the naming conventions for the above entities, see the Snowflake Naming Convention documentation.

Creating Packages

The current Depot Snowflake Resource Provider will only create the entities describe above in the first instance that they do not already exist. (“CREATE IF NOT EXIST” style queries are run by the provider).

You may also notice when adding new schemas to an existing Package, or adding a new Package with removal of an old Package, that older entities persist in Snowflake. This happens because there is no current support for Update (on schemas) or Delete of other Snowflake entities.

Permissions

Don’t forget to grant permissions for your Snowflake database and warehouse to the role that your user belongs to. Permissions must be granted explicitly, even if you are able to query the data without them through the Snowflake UI. MFA should be disabled on the account used for access.

If you create a secondary (replica) Depot storage location using Snowflake, you will need to update your Snowflake dedicated IAM role (that Snowflake assumes) to allow read/write access to your Depot environment’s data bucket. If you are using the createEntities flag for your Snowflake Location and/or Datasets, then you can identify this IAM role by named convention:

sdp-{environment-id}-snowflake-db-integration-{dataset-id}

Replication

dynamodb -> snowflake replication is batched (small window) using DynamoDB streams and Snowflake Transactions

Step by step explanation:

  • data is written to dynamo through connector request
  • dynamoDb stream invokes sdp-<envId>-stream lambda which:
    • writes history data to s3
  • event bridge rule scheduled based replicationWindow (default 60 min) starts rollup step function
  • rollup step function creates lock on dataset batch operations
  • checks previous rollup state and fails if previous run failed
  • calculates next delta
  • runs transaction
  • saves rollup state in dynamoDb
  • repeats until next delta matches upTo (or current hour by default)

Components involved in replication:

  • DynamoDB stream handler lambda named sdp-<envId>-stream
  • rollup delta lambda named sdp-<envId>-rollup-delta
  • EventBridge rule namedsdp-<envId>-<datasetId>-snowflake-rollup-schedule for scheduling the rollup step function
  • Snowflake rollup step function named sdp-<envId>-<datasetId>-snowflake-rollup
  • dataset state DynamoDB table sdp-<envId>-State
  • Transaction step function named sdp-<envId>-snowflake-transaction

Snowflake rollup