Cloud Infrastructure
So what sort of infrastructure does a Depot environment consist of? The detailed answer is “it depends” (because Depot is a modular system). Here are some examples to give you an idea:
Default Depot Environment
A default Depot environment with no extras such as Locations, Gateways, and Datasets added, will consist of the following AWS resources (among others):
- AWS VPC - a VPC is created with public and private subnets. Depot component infrastructure such as the API, ElasticSearch (AWS OpenSearch), EMR, and internally used Lambda functions are deployed across the Private Subnets in the environment’s dedicated VPC. This keeps all resources isolated in terms of networking against other potential environments or infrastructure in the same AWS account. The VPC will be created with components such as:
- NAT Gateway (routing allows infrastructure in the private subnets to talk out to the Internet via this. The benefit being that the Depot infrastructure’s internal IP addresses are ‘hidden’ from the systems out on the internet that they talk to, with the NAT Gateway’s IP address being shown as the origin of the communication instead).
- Internet Gateway (routing allows infrastructure in the public subnets to talk out to the Internet via this).
- Subnets - both Private and Public, split over AWS availability zones. These provide IP ranges for infrastructure to use for addressing.
- Route tables - routing is configured to allow resources to talk to eachother within subnets, and to inform resources how they can talk to other networks - e.g. the wider Internet.
- AWS Fargate - A Fargate Cluster, with Service and Task(s) depending on autoscaling configuration is deployed to run the Depot Connector API. This runs the Docker image that we build the Depot Connector API into. (The image is stored in AWS ECR by our build / CI process)
- AWS Network Load Balancer - An NLB is created along with the Fargate cluster. a VPC link is also hooked up here and allows the Fargate task running in the VPC to receive network traffic over HTTPS from one the Environment’s API Gateways fromexternal
- S3 and DynamoDB VPC Endpoints - these are created to allow Depot services inside the VPC to talk to S3 and DynamoDB without the need to traverse the Internet Gateway or NAT Gateway. In other words, they help save egress bandwidth costs, and also reduce the amount of exposure the VPC has to the public internet.
- S3 Bucket - An S3 “Data” bucket is created for each Depot environment. Data stored against your Datasets will be persisted here. For example, if you use a DynamoDB location, Data will be stored in Tables there, but could also be potentially‘exported’ to this environment-level Data bucket using our streaming feature. The data is persisted into S3 as parquet (avro) file objects.
- SQS - Event queues and DLQs are created for Depot environment internal events. These are also used for the Subscriptions feature.
- SNS - SNS topics are created that handle Depot environment event fan-out and propagation. An SNS topic is also created that assists with incoming Transactions.
- KMS - The environment creates a dedicated KMS encryption key by default that is used to encrypt AWS resources where possible.
- Secrets Manager - This is used when the Connector needs to know Storage Location credentials. For example SQL usernames and passwords, or a JDBC connection string. (E.g. if you use Snowflake or Aurora Locations)
- Lambda Functions - There are many Lambda functions deployed that handle internal Depot environment operations. However the one of most interest to developers is probably the Lambda Gateway function that is created by default. This Lambda function allows the Depot CLI to work with a target Depot environment. You can also directly invoke this Lambda function to query and operate on a Depot environment for Administration and Data operations.
Going Further
Once you start constructing data pipelines by creating Datasets, additional infrastructure is created depending on the storage configurations (Locations) for your Datasets. You can expect to see infrastucture deployed such as:
- S3 buckets
- Glue / Athena
- Snowflake (Databases, Storage Integrations, Warehouses)