Skip to main content

Export Transaction

An Export transaction is a bulk data unload from Snowflake or Postgres to S3.

It takes table or view transform data with a provided query, and unloads data to the path that is formed from the Stage and Export path.

To export to S3 you will need a named file format and Stage in your Dataset i.e.

my.CsvFormat:
type: format
format:
type: csv
skipHeader: 1
emptyFieldAsNull: true

my.CsvStage:
type: stage
url: s3://exampleBucket/exports/

You can use a query to transform data into the desired representation i.e.

my.CsvQuery:
type: query
sql: "SELECT * FROM @{src} WHERE INT = ${name}",
using:
src: my.someTable
arguments:
- name: name
type: string
properties:
target:
type: string
str:
type: string
int:
type: integer

To start an export you need to send a transaction request where the source is table/view/query and the target is a Stage,

To further configure the Export action use exportOptions:

  • path required. Adds the provided path to the Stage location.
  • formatSchema required, format the schema name from the same namespace. Uses file format configuration from a file
  • maxFileSize optional string, maps directly to Snowflake's max_file_size
  • partitionColumns optional array of column names, provided columns must be required, used to form PARTITION BY block in the query
  • staticPartitions optional map of key that matches column name in the list above and static value quoted with single quote
  • includeHeader optional boolean, maps directly to the Snowflake export parameter HEADER that adds a header to files
  • overwrite optional boolean, maps directly to the Snowflake export parameter OVERWRITE that overwrites files in the target path. This only works together with the single option, because snowflake can't guarantee that all old files will be deleted
  • single optional boolean, maps directly to the Snowflake export parameter SINGLE that creates a single file in the target path. While this enables the ability to control the output file name precisely and guarantees a single file, this can create performance issues on the reading end and it can hit S3 object size limits with AWS if the file is very large

Furthemore, the following options are available at the TransactionRequest.Action level:

  • columnFormat, may be one of TransactionRequest.ColumnFormat.{ AS_SPECIFIED, UPPER, LOWER, UPPER_SNAKE, LOWER_SNAKE, UPPER_CAMEL, LOWER_CAMEL } (see below)

example request

{
"operation": "EXPORT",
"source": "my.CsvQuery",
"target": "my.CsvStage",
"arguments": {
"name": "Nicolas Cage"
},
"exportOptions": {
"path": "/export/path/",
"formatSchema": "my.CsvFormat",
"maxFileSize": 500000,
"partitionColumns": ["someColumn", "id"],
"includeHeader": true
}
}

Partitioning

An Export can be partitioned by column values and/or values passed to the transaction request staticPartitions i.e:

  • if partitionColumns: ["name", "age"] the Export transaction would split the result first by name, and then by age /nameValue/ageValue
  • if partitionColumns: ["releaseDate"] and staticPartitions : {"releaseDate": "'2011-12-31'"} it would result in the single partition /2011-12-31
  • if partitionColumns: ["name", "releaseDate", "age"] and staticPartitions : {"releaseDate": "'2011-12-31'"} it would result in the partition /nameValue/2011-12-31/ageValue
note

Partitioning is not supported on Postgres

JSON export

So-called "JSON" export is actually in a related format called JSON Lines, with one object per line separated by newlines.

example

ns1.ColumnFormatCheck:
type: object
properties:
someField:
type: string!
someOtherField:
type: integer!
someRef:
type: ns1.TestExportLeaf
flattenedPoint:
type:
Point:
layout: FLATTENED
structuredPoint:
type:
Point:
layout: STRUCTURED
Point:
type: struct
properties:
x: number
y: number

ns1.JsonStage:
type: stage
format: ns1.JsonFormat
url: s3://....

ns1.JsonFormat:
type: format
format:
type: json
fileExtension: ".jsonl"
compression: "none"

with this in place, you can run an export:

const exportStep = depot.runTransaction(scope, 'Json Export sample', {
operation: TransactionRequest.Operation.EXPORT,
source: "ns1.ColumnFormatCheck",
target: "ns1.JsonStage",
columnFormat: TransactionRequest.ColumnFormat.AS_SPECIFIED,
});

The output data shape depends on columnFormat, as shown in the following examples:

columnFormatsample outputnotes
null (default)
{
"flattenedPoint_x": 1,
"flattenedPoint_y": 2,
"id": "1",
"someField": "some value",
"someOtherField": 42,
"someRef_id": "tel1",
"structuredPoint": {
"x": 3,
"y": 4
},
"version": 1
}
the field names generally reflect the original field names from the Depot schema definition; FLATTENED and STRUCTURED fields are applied; field names within struct fields or ref fields are separated by underscore.
ColumnFormat.AS_SPECIFIED
{
"id": "1",
"someField": "some value",
"someOtherField": 42,
"someRef": {
"id": "tel1"
},
"flattenedPoint": {
"x": 1,
"y": 2
},
"structuredPoint": {
"x": 3,
"y": 4
},
"version": 1
}
The field names reflect the field names in the Depot schema definition.

The layout (flattened/structured) from structs is ignored, and all objects are returned as proper objects.

tip

ColumnFormat.AS_SPECIFIED matches the input expected by Postgres in JSONL format.

ColumnFormat.UPPER_SNAKE
{
"FLATTENED_POINT_X": 1,
"FLATTENED_POINT_Y": 2,
"ID": "1",
"SOME_FIELD": "some value",
"SOME_OTHER_FIELD": 42,
"SOME_REF_ID": "tel1",
"STRUCTURED_POINT": {
"x": 3,
"y": 4
},
"VERSION": 1
}
This is the format that most closely matches the natural behaviour of Snowflake. Top-level fields are returned in uppercase, but nested fields from structured objects are returned using the natural (basestar) capitalisation.
ColumnFormat.LOWER_CAMEL
{
"flattenedPoint_x": 1,
"flattenedPoint_y": 2,
"id": "1",
"someField": "some value",
"someOtherField": 42,
"someRef_id": "tel1",
"structuredPoint": {
"x": 3,
"y": 4
},
"version": 1
}
In LOWER_CAMEL, top-level fields are returned in lower camel case. This happens even for fields which are defined (contrary to conventions) differently from camel case. Second-level fields are returned as they are.