Snowflake Naming Conventions
Snowflake Objects
In order to use Snowflake objects as Depot resources, we need to follow naming conventions to map from Depot Datasets and Schemas to the underlying Snowflake Databases, Schemas, Tables and Views.
Dataset
A Depot Dataset maps onto a Snowflake Database. Datasets are scoped to an Environment, whereas as Snowflake Databases are contained within an account, therefore we need to concatenate the Environment and Dataset names to create the Snowflake Database name
| DPT Environment | DPT Dataset | SF Database |
|---|---|---|
| poc | FooBar | POC_FOO_BAR |
| acc | BarRaz | ACC_BAR_RAZ |
Schema
A Depot Schema uses a fully qualified name, including a package name. The package groups related Depot Schemas, similar to how a Snowflake Schema groups related Snowflake Tables and Views.
| DPT Fully Qualified Name | SF Schema | SF Table/View |
|---|---|---|
| typed.xyz.mrb_10.Qq03 | TYPED_XYZ_MRB_10 | QQ03 |
| mp4_14_1.untyped.MediaFile | MP4_14_1_UNTYPED | MEDIA_FILE |
| UnqualifiedSchema | PUBLIC | UNQUALIFIED_SCHEMA |
For comparison, this is how we will convert the same fully qualified names to tables in a database that doesn’t have Schemas, for example, Athena.
| DPT Fully Qualified Name | Athena Table/View |
|---|---|
| typed.xyz.mrb_10.Qq03 | TYPED_XYZ_MRB_10__QQ03 |
| mp4_14_1.untyped.MediaFile | MP4_14_1_UNTYPED__MEDIA_FILE |
| UnqualifiedSchema | UNQUALIFIED_SCHEMA |
Views vs Tables
If a Snowflake View or Table
- Has an id (string) and version (int) field then create it as a Basestar Object
- Doesn’t have an id (string) and version (int) field then create it as a Basestar View
Functions
| DPT Fully Qualified Name | SF Schema | SF Function |
|---|---|---|
| typed.xyz.mrb_10.isRightType | TYPED_XYZ_MRB_10 | IS_RIGHT_TYPE |
| isRightType | PUBLIC | IS_RIGHT_TYPE |
Test Objects
When running Depot tests, all of the packages schemas, including any dependencies, are created in the same test database specified in depot.properties
The Snowflake schemas are created with a randomly generated test prefix in order to isolate the tests, for example, TEST_1637230633_ADF1_TYPED_ABC_DEF_10. These schemas are then deleted at the end of the test.
To aid testing and debugging, you can specify the SNOWFLAKE_SCHEMA_PREFIX=MY_TEST environment variable. When specified the schemas are created with the prefix, for example, MY_TEST_ADF1_TYPED_ABC_DEF_10. The schemas are not deleted at the end of the test so you can inspect the results of the test. However, they are deleted at the start of the next test run to make sure we always start with a clean set of test data. There is a further environment variable that can be specified to prevent the schemas being deleted at the start of the test, KEEP_SNOWFLAKE_SCHEMA=true.
To create schemas in your own database that exactly match the naming when
deployed to an environment, use SNOWFLAKE_SCHEMA_PREFIX=
Depot Stores
We need to consider if/how we’ll support the Depot stores in Snowflake. A solution is to make support for stores an environment option, and when enabled, the store ID is appended to the database name.
For example, POC_ABC_DEF_a43b17d would be the database for the current store, while POC_ABC_DEF would contain read-only views that would enable SQL clients, for example, Quicksight, to reference the data for the current store in a consistent way.
Snowflake Warehouse conventions for Development
- Always ensure Warehouses are named so they can be attributed to a specific environment, process, or developer.
- Always choose X_SMALL as the size for a Snowflake Warehouse and set auto-suspend to enabled with a suspend time of 5-10 minutes for small or dev environments where resume latency is not an issue.
-- create a x-small sized warehouse with low suspend time
CREATE WAREHOUSE MY_WAREHOUSE WITH WAREHOUSE_SIZE = 'XSMALL' WAREHOUSE_TYPE = 'STANDARD' AUTO_SUSPEND = 5 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 2 SCALING_POLICY = 'STANDARD';
-- set a suitable statement timeout default (lower than the usual 48 hours)
ALTER WAREHOUSE MY_WAREHOUSE SET STATEMENT_TIMEOUT_IN_SECONDS = 1800;
Overriding the naming conventions
It is possible to override the naming conventions described above, by using the sql.table extension:
my.ObjectWithAVeryLongAndInconvenientName:
type: object
properties:
id:
type: string
primary: true
name:
type: string
extensions:
sql.table: MY_SCHEMA.MY_OBJECT
The sql.history.table extension can be used to override the history table name, if the default policy of
adding the _HISTORY suffix to the schema name is unsuitable.
The Aurora (Postgres) Locations also respond to sql.table and sql.history.table extensions.
If your schemas need to be deployable on both types (e.g. you use a "Test Locally on Postgres, Deploy on Snowflake" strategy),
then you may need to use different names.
You can use snowflake.sql.table and snowflake.sql.history.table as alternate extensions only Snowflake responds to.