Executors
Overview and configuration
Depot Executors are used to inform Depot as to what underlying infrastructure should be used to execute operations against a Dataset (or set of Datasets). They can also designate resources to be used for compute (as is the case for Snowflake Executors which are backed by Snowflake Warehouses).
Executors can be used for different data operation purposes on Datasets. For example, you might want to use specific Executors for API calls, Transactions, Deployment, or Replication operations. See the Dataset Executors section for more details on assigning specific executors for various Dataset purposes.
Snowflake Executor
Snowflake executors require Snowflake credentials to be assigned. This enables them to:
- Self-provision themselves if required (e.g. creating a Warehouse)
- Designate credentials to be used for the purposes to which they are assigned. For example if an Executor is designated as used for API operations against a Dataset
Define a Snowflake Executor that uses a specific existing Snowflake Warehouse, and is configured with credentials like this:
import { Executor, SnowflakeWarehouse } from "@stage-tech/depot-cdk";
new Executor.Snowflake(this, "SnowflakeExecutor", {
environment: depotEnvironment,
name: "snowflake-executor1",
credentials: {
credentialsSecretArn: "arn:aws:secretsmanager:eu-west-1:123456789012:secret:my-snowflake-credentials-QEJZcJ",
accountId: "AB12345",
region: "eu-west-1"
},
warehouse: SnowflakeWarehouse.existing("MY_EXISTING_WAREHOUSE")
});
The warehouse property can be configured manually, using the SnowflakeWarehouse type, or you can use convenience
methods:
SnowflakeWarehouse.existing()SnowflakeWarehouse.depotManaged()
Using a depotManaged warehouse signifies that you wish for Depot to create that warehouse and manage it for you. Note
that the warehouse
will be created and modified, however Depot will not tear down (delete) the warehouse when you remove the CDK
definition.
You may not have two Depot-managed warehouses with the same names.
Example:
const executor = new Executor.Snowflake(stack, "SnowflakeExecutor", {
environment: sdpEnv,
name: "snowflake-executor1",
warehouse: SnowflakeWarehouse.depotManaged({
name: "MANAGED_WAREHOUSE",
size: "SMALL",
minClusterCount: 1,
maxClusterCount: 1,
scalingPolicy: "ECONOMY",
maxConcurrencyLevel: 8,
statementTimeoutSeconds: 120,
statementQueuedTimeoutSeconds: 180
}),
credentials: {
credentialsSecretArn: "arn:aws:secretsmanager:eu-west-1:123456789012:secret:my-snowflake-credentials-QEJZcJ",
accountId: "AB12345",
region: "eu-west-1"
}
});
The named warehouse will override any other default Warehouses set at the Snowflake user level for the user
represented by the credentials.
The credentials Secret reference (credentialsSecretArn) must contain three JSON field
values: username, password, and privatekey. The private key should be the private key associated with
the Snowflake user without the key header / footer, and should be a single concatenated line string. (i.e. no line
breaks or carriage returns).
The accountId can be used to indicate either a short Snowflake account ID, or an account alias.
If short account ID (AB12345) is used, region must be provided and will be concatenated as accountId.region in
Snowflake URLs.
EMR Spark Serverless Executor
An EMR Spark Serverless Executor is used for Backup and Restore functionality. There are no configuration options, besides needing to pass in a reference to the environment.
import { Executor } from "@stage-tech/depot-cdk";
new Executor.EmrServerless(this, "SnowflakeExecutor", {
name: "emr-serverless-spark-1",
environment: depotEnvironment
});
Fargate Executor
A fargate executor is used only for API requests. When configured for a dataset or set of datasets, then API operations will be processed using the fargate executor instead of the default configured fargate infrastructure.
The following interfaces support dataset-level fargate executors:
- lambda gateway requests
- IAM/public API requests for a dataset using path mapping e.g. https://{env url}/{dataset id}
- websocket requests
At present, the following requests will still be served using the default executor:
- IAM/public API requests for a dataset using subdomain mapping e.g. https://{dataset id}.{env url}
In the future, requests using all supported interfaces may use the configured fargate executor.
The following configuration properties are supported:
export interface FargateExecutorProps extends ExecutorProps {
/* Size of task (similar to EC2 instance type) */
readonly taskSize?: StageDepotApplicationTaskSize;
/* Scaling configuration, as environemnt frontend configuration */
readonly scaling?: StageDepotFrontendScalingProps;
/* Whether to use ONDEMAND or SPOT capacity modes */
readonly preferredCapacityProvider?: ApplicationFargateCapacityType;
}
When additional fargate executors are configured, additional charts are shown in the environment dashboard with relevant metrics from these fargate tasks.
Long-Running Aurora Executor
When using a regular aurora executor, the limit for a single transaction is 15 minutes. If a transaction is expected to take longer than this, you can use a long-running executor.
It is recommended to use this executor only when necessary, as it adds around 90 seconds to each transaction duration due to the start-up of the container and the jvm.
Here is an example of how to configure a long-running executor:
Aurora.longRunning(this, 'LongRunningExecutor', {
name: 'long-running-executor-1',
environment: depotEnvironment,
instance: { // optional configuration for the job instance
vCpus: 1, // default 0.25
memory: 4, // in GB, default 0.5
}
});
Such an executor will be automatically available for all datasets and can be shared across multiple datasets.
Once the long-running executor is configured, you can use it in the transaction:
{
dataset: {
id: 'SomeDataset'
},
transaction: 'AUTO',
actions: [...],
executor: {
id: 'long-running-executor-1'
},
...
}
How is the long-running executor different from the regular executor?
The long-running executor is a separate service that is designed to run transactions that take longer than 15 minutes. It is not designed to be used for transactions that take less than 15 minutes, as it takes around 90 seconds (for each transaction) to start up. It is currently implemented using AWS Batch Jobs.
Debugging long-running transactions
When using the long-running executor, you can view the logs in CloudWatch dashboard or in the AWS Batch console.
The job name will be in the format of aurora-txn-<dataset-id>-<transaction-id>. You can find the transaction id in the
transaction logs in the depot console.
Another way to get to the logs is to find the aurora transaction execution (ie. <prefix>-aurora-transaction-run) in
the step functions and navigate to the logs from the Run Transaction on AWS Batch step.
Advanced parameters
The property bag in Aurora.longRunning() admits the following extra properties, in the unlikely case the defaults aren't sensible:
bulkInsertProps
The bulkInsertProps property contains a sub-property bag with properties controlling the implementation of the
"bulk inserter", a component that drives mass delivery of data into Postgres.
| Property | Type | Default Value | Description |
|---|---|---|---|
stallTimeout | cdk.Duration | 60 seconds | After this time with no progress, the transaction is terminated. |
stallWarningThreshold | cdk.Duration | undefined | A warning is printed if this much time happens without any progress. This must be strictly smaller than stallTimeout if defined. |
sanitizationChunkLength | integer | 50 | The number of rows taken in the sanitization step at once, using all available CPU cores. |
copyChunkLength | integer | 100000 | The number of rows to be delivered to Postgres at once using a single COPY statement. |
Limitations
At the moment the long-running executor can only run 10 transactions concurrently. If needed we will expose a way to increase this limit when configuring the executor.
Cross location executor
Overview
The Cross Location Executor is used in cases you want to use non default dataset executors for cross location transaction, i.e. to use long-running executor for aurora side and specific executor for snowflake side. After creating cross location executor you can use it as any other executor and provide it in the transaction. Cross location executor is not required to run cross location transaction and it just a tool to provide multiple executors, therefore if you only need to provide an executor for one side of the transaction you can still provide it as before and depot will assign it to the right location
Example Configuration
new depot.Executor.CrossLocation(this, 'CrossLocationExecutorTest', {
name: 'x-location-test',
environment: depotEnvironment,
executors: {
[ExecutorType.SNOWFLAKE]: someSnowflakeExecutor.id,
[ExecutorType.AURORA]: auroraLongRunningExecutor.id,
}
});