Export Transaction

An Export transaction is a bulk data unload from Snowflake or Postgres to S3.

It takes table or view transform data with a provided query, and unloads data to the path that is formed from the Stage and Export path.

To export to S3 you will need a named file format and Stage in your Dataset i.e.

my.CsvFormat:
  type: format
  format:
    type: csv
    skipHeader: 1
    emptyFieldAsNull: true

my.CsvStage:
  type: stage
  url: s3://exampleBucket/exports/

You can use a query to transform data into the desired representation i.e.

my.CsvQuery:
  type: query
  sql: "SELECT * FROM @{src} WHERE INT = ${name}",
  using:
    src: my.someTable
  arguments:
    - name: name
      type: string
  properties:
    target:
      type: string
    str:
      type: string
    int:
      type: integer

To start an export you need to send a transaction request where the source is table/view/query and the target is a Stage,

To further configure the Export action use exportOptions:

path required. Adds the provided path to the Stage location.
formatSchema required, format the schema name from the same namespace. Uses file format configuration from a file
maxFileSize optional string, maps directly to Snowflake's max_file_size
partitionColumns optional array of column names, provided columns must be required, used to form PARTITION BY block in the query
staticPartitions optional map of key that matches column name in the list above and static value quoted with single quote
includeHeader optional boolean, maps directly to the Snowflake export parameter HEADER that adds a header to files
overwrite optional boolean, maps directly to the Snowflake export parameter OVERWRITE that overwrites files in the target path. This only works together with the single option, because snowflake can't guarantee that all old files will be deleted
single optional boolean, maps directly to the Snowflake export parameter SINGLE that creates a single file in the target path. While this enables the ability to control the output file name precisely and guarantees a single file, this can create performance issues on the reading end and it can hit S3 object size limits with AWS if the file is very large

Furthemore, the following options are available at the TransactionRequest.Action level:

columnFormat, may be one of TransactionRequest.ColumnFormat.{ AS_SPECIFIED, UPPER, LOWER, UPPER_SNAKE, LOWER_SNAKE, UPPER_CAMEL, LOWER_CAMEL } (see below)

example request

{
  "operation": "EXPORT",
  "source": "my.CsvQuery",
  "target": "my.CsvStage",
  "arguments": {
    "name": "Nicolas Cage"
  },
  "exportOptions": {
    "path": "/export/path/",
    "formatSchema": "my.CsvFormat",
    "maxFileSize": 500000,
    "partitionColumns": ["someColumn", "id"],
    "includeHeader": true
  }
}

Partitioning

An Export can be partitioned by column values and/or values passed to the transaction request staticPartitions i.e:

if partitionColumns: ["name", "age"] the Export transaction would split the result first by name, and then by age /nameValue/ageValue
if partitionColumns: ["releaseDate"] and staticPartitions : {"releaseDate": "'2011-12-31'"} it would result in the single partition /2011-12-31
if partitionColumns: ["name", "releaseDate", "age"] and staticPartitions : {"releaseDate": "'2011-12-31'"} it would result in the partition /nameValue/2011-12-31/ageValue

note

Partitioning is not supported on Postgres

JSON export

So-called "JSON" export is actually in a related format called JSON Lines, with one object per line separated by newlines.

example

ns1.ColumnFormatCheck:
  type: object
  properties:
    someField:
      type: string!
    someOtherField:
      type: integer!
    someRef:
      type: ns1.TestExportLeaf
    flattenedPoint:
      type: 
        Point:
          layout: FLATTENED
    structuredPoint:
      type:
        Point:
          layout: STRUCTURED
Point:
  type: struct
  properties:
    x: number
    y: number

ns1.JsonStage:
  type: stage
  format: ns1.JsonFormat
  url: s3://....

ns1.JsonFormat:
  type: format
  format:
    type: json
    fileExtension: ".jsonl"
    compression: "none"

with this in place, you can run an export:

  const exportStep = depot.runTransaction(scope, 'Json Export sample', {
    operation: TransactionRequest.Operation.EXPORT, 
    source: "ns1.ColumnFormatCheck",
    target: "ns1.JsonStage",
    columnFormat: TransactionRequest.ColumnFormat.AS_SPECIFIED,
  });

The output data shape depends on columnFormat, as shown in the following examples:

`columnFormat`	sample output	notes
`null` (default)	`{ "flattenedPoint_x": 1, "flattenedPoint_y": 2, "id": "1", "someField": "some value", "someOtherField": 42, "someRef_id": "tel1", "structuredPoint": { "x": 3, "y": 4 }, "version": 1 }`	the field names generally reflect the original field names from the Depot schema definition; FLATTENED and STRUCTURED fields are applied; field names within struct fields or ref fields are separated by underscore.
`ColumnFormat.AS_SPECIFIED`	`{ "id": "1", "someField": "some value", "someOtherField": 42, "someRef": { "id": "tel1" }, "flattenedPoint": { "x": 1, "y": 2 }, "structuredPoint": { "x": 3, "y": 4 }, "version": 1 }`	The field names reflect the field names in the Depot schema definition. The layout (flattened/structured) from structs is ignored, and all objects are returned as proper objects. tip `ColumnFormat.AS_SPECIFIED` matches the input expected by Postgres in JSONL format.
`ColumnFormat.UPPER_SNAKE`	`{ "FLATTENED_POINT_X": 1, "FLATTENED_POINT_Y": 2, "ID": "1", "SOME_FIELD": "some value", "SOME_OTHER_FIELD": 42, "SOME_REF_ID": "tel1", "STRUCTURED_POINT": { "x": 3, "y": 4 }, "VERSION": 1 }`	This is the format that most closely matches the natural behaviour of Snowflake. Top-level fields are returned in uppercase, but nested fields from structured objects are returned using the natural (basestar) capitalisation.
`ColumnFormat.LOWER_CAMEL`	`{ "flattenedPoint_x": 1, "flattenedPoint_y": 2, "id": "1", "someField": "some value", "someOtherField": 42, "someRef_id": "tel1", "structuredPoint": { "x": 3, "y": 4 }, "version": 1 }`	In LOWER_CAMEL, top-level fields are returned in lower camel case. This happens even for fields which are defined (contrary to conventions) differently from camel case. Second-level fields are returned as they are.

example request​

Partitioning​

JSON export​

example​

example request

Partitioning

JSON export

example