Export Transaction
An Export transaction is a bulk data unload from Snowflake or Postgres to S3.
It takes table or view transform data with a provided query, and unloads data to the path that is formed from the Stage and Export path.
To export to S3 you will need a named file format and Stage in your Dataset i.e.
my.CsvFormat:
type: format
format:
type: csv
skipHeader: 1
emptyFieldAsNull: true
my.CsvStage:
type: stage
url: s3://exampleBucket/exports/
You can use a query to transform data into the desired representation i.e.
my.CsvQuery:
type: query
sql: "SELECT * FROM @{src} WHERE INT = ${name}",
using:
src: my.someTable
arguments:
- name: name
type: string
properties:
target:
type: string
str:
type: string
int:
type: integer
To start an export you need to send a transaction request where the source is table/view/query and the target is a Stage,
To further configure the Export action use exportOptions:
pathrequired. Adds the provided path to the Stage location.formatSchemarequired, format the schema name from the same namespace. Uses file format configuration from a filemaxFileSizeoptional string, maps directly to Snowflake'smax_file_sizepartitionColumnsoptional array of column names, provided columns must be required, used to formPARTITION BYblock in the querystaticPartitionsoptional map ofkeythat matches column name in the list above and staticvaluequoted with single quoteincludeHeaderoptional boolean, maps directly to the Snowflake export parameterHEADERthat adds a header to filesoverwriteoptional boolean, maps directly to the Snowflake export parameterOVERWRITEthat overwrites files in the target path. This only works together with thesingleoption, because snowflake can't guarantee that all old files will be deletedsingleoptional boolean, maps directly to the Snowflake export parameterSINGLEthat creates a single file in the target path. While this enables the ability to control the output file name precisely and guarantees a single file, this can create performance issues on the reading end and it can hit S3 object size limits with AWS if the file is very large
Furthemore, the following options are available at the TransactionRequest.Action level:
columnFormat, may be one ofTransactionRequest.ColumnFormat.{ AS_SPECIFIED, UPPER, LOWER, UPPER_SNAKE, LOWER_SNAKE, UPPER_CAMEL, LOWER_CAMEL }(see below)
example request
{
"operation": "EXPORT",
"source": "my.CsvQuery",
"target": "my.CsvStage",
"arguments": {
"name": "Nicolas Cage"
},
"exportOptions": {
"path": "/export/path/",
"formatSchema": "my.CsvFormat",
"maxFileSize": 500000,
"partitionColumns": ["someColumn", "id"],
"includeHeader": true
}
}
Partitioning
An Export can be partitioned by column values and/or values passed to the transaction request staticPartitions
i.e:
- if
partitionColumns: ["name", "age"]the Export transaction would split the result first by name, and then by age/nameValue/ageValue - if
partitionColumns: ["releaseDate"]andstaticPartitions : {"releaseDate": "'2011-12-31'"}it would result in the single partition/2011-12-31 - if
partitionColumns: ["name", "releaseDate", "age"]andstaticPartitions : {"releaseDate": "'2011-12-31'"}it would result in the partition/nameValue/2011-12-31/ageValue
Partitioning is not supported on Postgres
JSON export
So-called "JSON" export is actually in a related format called JSON Lines, with one object per line separated by newlines.
example
ns1.ColumnFormatCheck:
type: object
properties:
someField:
type: string!
someOtherField:
type: integer!
someRef:
type: ns1.TestExportLeaf
flattenedPoint:
type:
Point:
layout: FLATTENED
structuredPoint:
type:
Point:
layout: STRUCTURED
Point:
type: struct
properties:
x: number
y: number
ns1.JsonStage:
type: stage
format: ns1.JsonFormat
url: s3://....
ns1.JsonFormat:
type: format
format:
type: json
fileExtension: ".jsonl"
compression: "none"
with this in place, you can run an export:
const exportStep = depot.runTransaction(scope, 'Json Export sample', {
operation: TransactionRequest.Operation.EXPORT,
source: "ns1.ColumnFormatCheck",
target: "ns1.JsonStage",
columnFormat: TransactionRequest.ColumnFormat.AS_SPECIFIED,
});
The output data shape depends on columnFormat, as shown in the following examples:
columnFormat | sample output | notes |
|---|---|---|
null (default) |
| the field names generally reflect the original field names from the Depot schema definition; FLATTENED and STRUCTURED fields are applied; field names within struct fields or ref fields are separated by underscore. |
ColumnFormat.AS_SPECIFIED
|
| The field names reflect the field names in the Depot schema definition.
The layout (flattened/structured) from structs is ignored, and all objects are returned as proper objects. tip
|
ColumnFormat.UPPER_SNAKE |
| This is the format that most closely matches the natural behaviour of Snowflake. Top-level fields are returned in uppercase, but nested fields from structured objects are returned using the natural (basestar) capitalisation. |
ColumnFormat.LOWER_CAMEL |
| In LOWER_CAMEL, top-level fields are returned in lower camel case. This happens even for fields which are defined (contrary to conventions) differently from camel case. Second-level fields are returned as they are. |