Batch Jobs
A batch job is a compute resource designed to run a containerized task until it completes. The execution is triggered by an event, such as an HTTP request, a message in a queue, or an object uploaded to a bucket.
A key feature of batch jobs is the ability to use spot instances, which can reduce compute costs by up to 90%.
Like other Stacktape compute resources, batch jobs are serverless, meaning you don't need to manage the underlying infrastructure. Stacktape handles server provisioning, scaling, and security for you. You can also equip your batch job's environment with a GPU in addition to CPU and RAM.
Under the hood
Stacktape uses a combination of AWS services to provide a seamless experience for running containerized jobs:
- AWS Batch: Provisions the virtual machines where your job runs and manages the execution.
- AWS Step Functions: Manages the job's lifecycle, including retries and timeouts, using a serverless state machine.
- AWS Lambda: A trigger function that connects the event source to the batch job and starts its execution.
The execution flow is as follows:
- An event from an integration (like an API Gateway) invokes the trigger function.
- The trigger function starts the batch job state machine.
- The state machine queues the job in AWS Batch.
- AWS Batch provisions the necessary resources (like a VM) and runs your containerized job.
When to use
Batch jobs are ideal for long-running, resource-intensive tasks like data processing, ETL pipelines, or machine learning model training.
If you're unsure which compute resource to use, this table provides a comparison of container-based resources in Stacktape:
| Resource type | Description | Use-cases |
|---|---|---|
| web-service | continuously running container with public endpoint and URL | public APIs, websites |
| private-service | continuously running container with private endpoint | private APIs, services |
| worker-service | continuously running container not accessible from outside | continuous processing |
| multi-container-workload | custom multi container workload - you can customize accessibility for each container | more complex use-cases requiring customization |
| batch-job | simple container job - container is destroyed after job is done | one-off/scheduled processing jobs |
Advantages
- Pay-per-use: You only pay for the compute time your job consumes.
- Resource flexibility: The environment automatically scales to provide the CPU, memory, and GPU your job needs.
- Time flexibility: Batch jobs can run for as long as needed.
- Secure by default: The underlying environment is securely managed by AWS.
- Easy integration: Can be triggered by a wide variety of event sources.
Disadvantages
- Slow start time: After a job is triggered, it's placed in a queue and can take anywhere from a few seconds to a few minutes to start.
Basic usage
resources:myBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800events:- type: scheduleproperties:scheduleRate: cron(0 14 * * ? *) # every day at 14:00 UTC
(async () => {const event = JSON.parse(process.env.STP_TRIGGER_EVENT_DATA);// process the event})();
Container
Your code for a batch job runs inside a Docker container. You can configure its properties:
Image
A Docker container is a running instance of a Docker image. You can provide an image in four ways:
- Images built using stacktape-image-buildpack
- Images built using external-buildpack
- Images built from a custom-dockerfile
- prebuilt-images
Environment variables
A list of environment variables to pass to the script or command.
Values can be:
- A static string, number, or boolean.
- The result of a custom directive.
- A reference to another resource's parameter using the `$ResourceParam` directive.
- A value from a secret using the `$Secret` directive.
environment:- name: STATIC_ENV_VARvalue: my-env-var- name: DYNAMICALLY_SET_ENV_VARvalue: $MyCustomDirective('input-for-my-directive')- name: DB_HOSTvalue: $ResourceParam('myDatabase', 'host')- name: DB_PASSWORDvalue: $Secret('dbSecret.password')
Pre-set environment variables
Stacktape pre-sets the following environment variables for your job:
| Name | Value |
|---|---|
| STP_TRIGGER_EVENT_DATA | Contains JSON stringified event from an event integration that triggered this batch job. |
| STP_MAXIMUM_ATTEMPTS | The total number of attempts for this job before it is marked as failed. |
| STP_CURRENT_ATTEMPT | The current attempt number. |
Logging
Any output from your code to stdout or stderr is captured and stored in an AWS CloudWatch log group.
You can view logs in two ways:
- AWS CloudWatch Console: Get a direct link from the Stacktape Console or by using the
stacktape stack-infocommand. - Stacktape CLI: Use the
stacktape logscommand to stream logs directly in your terminal.
Log storage can incur costs, so you can configure retentionDays to automatically delete old logs.
Forwarding logs
You can forward logs to third-party services. See Log Forwarding for more details.
Computing resources
You can specify the amount of CPU, memory, and GPU for your batch job. AWS Batch selects the most cost-effective instance type that fits your job's requirements. To learn more about GPU instances, refer to the AWS Docs.
Important: AWS instances require a small amount of memory for their own management processes. If you request memory in exact powers of 2 (e.g., 8192 MB for 8 GiB), a larger instance may be provisioned than you expect.
Recommendation: To ensure efficient instance usage, consider requesting slightly less memory (e.g., 7680 MB instead of 8192 MB). This allows the job to fit on a standard 8 GiB instance without needing to scale up.
For more details, see the AWS documentation on memory management.
Specifying this will ensure the job runs on a GPU-accelerated instance. Supported families include NVIDIA A100 (for deep learning) and A10G (for graphics and ML inference). If omitted, a CPU-only instance will be used.
resources:myBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: batch-jobs/js-batch-job.jsresources:cpu: 2memory: 1800events:- type: scheduleproperties:scheduleRate: 'cron(0 14 * * ? *)' # every day at 14:00 UTC
Spot instances
Benefits:
- Save up to 90% compared to on-demand pricing by using spare AWS capacity.
Important Considerations:
- Spot Instances can be interrupted at any time. Your container will receive a
SIGTERMsignal and has 120 seconds to save its state and shut down gracefully. - Your application should be designed to be fault-tolerant. This can be achieved by implementing checkpointing or by making the job idempotent (safe to restart from the beginning).
For more information, see the AWS Spot Instance Advisor for interruption rates and best practices.
resources:myBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800useSpotInstances: true
Retries
If a job fails (e.g., non-zero exit code, timeout, Spot Instance interruption), it can be automatically retried.
Timeout
If the job exceeds this timeout, it will be stopped. If retries are configured, the job will be re-run.
resources:myBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800timeout: 1200
Storage
Each batch job has its own ephemeral storage with a fixed size of 20GB. This storage is temporary and is deleted after the job completes or fails. To store data permanently, use Buckets.
Trigger events
Batch jobs are invoked in response to events from various integrations. A single job can have multiple triggers. The data payload from the trigger is available in the STP_TRIGGER_EVENT_DATA environment variable as a JSON string.
Be cautious when configuring event integrations. A high volume of events can trigger a large number of batch jobs, leading to unexpected costs. For example, 1000 HTTP requests to a connected API Gateway will result in 1000 job invocations.
HTTP Api event
Triggers the job in response to a request to a specified HTTP API Gateway. Routes are matched based on the most specific path. For more details, see the AWS Docs.
resources:myHttpApi:type: http-api-gatewaymyBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800events:- type: http-api-gatewayproperties:httpApiGatewayName: myHttpApipath: /hellomethod: GET
Lambda function connected to an HTTP API Gateway "myHttpApi"
Cognito authorizer
Restricts access to users authenticated with a User Pool. The request must include an access token. If authorized, the job receives user claims in its payload.
resources:myGateway:type: http-api-gatewaymyUserPool:type: user-auth-poolproperties:userVerificationType: email-codemyBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800events:- type: http-api-gatewayproperties:httpApiGatewayName: myGatewaypath: /some-pathmethod: '*'authorizer:type: cognitoproperties:userPoolName: myUserPool
Example cognito authorizer
import { CognitoIdentityProvider } from '@aws-sdk/client-cognito-identity-provider';const cognito = new CognitoIdentityProvider({});(async () => {const event = JSON.parse(process.env.STP_TRIGGER_EVENT_DATA);const userData = await cognito.getUser({ AccessToken: event.headers.authorization });// do something with your user data})();
Example lambda batch job that fetches user data from Cognito
Lambda authorizer
Uses a dedicated Lambda function to decide if a request is authorized. The authorizer function returns a policy document or a simple boolean response. You can configure identitySources to specify which parts of the request are used for authorization. To learn more, see the AWS Docs.
Schedule event
Triggers the job on a defined schedule using either a fixed rate (e.g., every 5 minutes) or a cron expression.
resources:myBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800events:# invoke function every two hours- type: scheduleproperties:scheduleRate: rate(2 hours)# invoke function at 10:00 UTC every day- type: scheduleproperties:scheduleRate: cron(0 10 * * ? *)
Event Bus event
Triggers the job when a matching event is received by a specified event bus. You can use the default AWS event bus or a custom event bus.
resources:myBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800events:- type: event-busproperties:useDefaultBus: trueeventPattern:source:- 'aws.autoscaling'region:- 'us-west-2'
Batch job connected to the default event bus
resources:myEventBus:type: event-busmyBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800events:- type: event-busproperties:eventBusName: myEventBuseventPattern:source:- 'mycustomsource'
Batch job connected to a custom event bus
SNS event
Triggers the job when a message is published to an SNS topic.
resources:myBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800events:- type: snsproperties:topicName: mySnsTopicmySnsTopic:type: sns-topic
SQS event
Triggers the job when messages are available in an SQS queue. Messages are processed in batches. If the job fails to start, messages return to the queue after the visibility timeout. If the job starts but then fails, the messages are considered processed.
A single queue should be consumed by a single compute resource. If you need a fan-out pattern, consider using an SNS or EventBus integration.
resources:myBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800events:- type: sqsproperties:sqsQueueName: mySqsQueuemySqsQueue:type: sqs-queue
Kinesis event
Triggers the job when records are available in a Kinesis Data Stream. It's similar to SQS but designed for real-time data streaming. You can add a Kinesis stream using CloudFormation resources.
resources:myBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800events:- type: kinesis-streamproperties:autoCreateConsumer: truemaxBatchWindowSeconds: 30batchSize: 200streamArn: $CfResourceParam('myKinesisStream', 'Arn')onFailure:arn: $CfResourceParam('myOnFailureSqsQueue', 'Arn')type: sqscloudformationResources:myKinesisStream:Type: AWS::Kinesis::StreamProperties:ShardCount: 1myOnFailureSqsQueue:Type: AWS::SQS::Queue
DynamoDB event
Triggers the job in response to item-level changes in a DynamoDB table. You must enable DynamoDB Streams on your table.
resources:myDynamoDbTable:type: dynamo-db-tableproperties:primaryKey:partitionKey:name: idtype: stringstreamType: NEW_AND_OLD_IMAGESmyBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800events:- type: dynamo-db-streamproperties:streamArn: $ResourceParam('myDynamoDbTable', 'streamArn')batchSize: 200
S3 event
Triggers the job when a specific event (like object created) occurs in an S3 bucket.
resources:myBucket:type: bucketmyBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800events:- type: s3properties:bucketArn: $ResourceParam('myBucket', 'arn')s3EventType: 's3:ObjectCreated:*'filterRule:prefix: order-suffix: .jpg
Cloudwatch Log event
Triggers the job when a log record is added to a specified CloudWatch log group. The event payload is BASE64 encoded and GZIP compressed.
resources:myLogProducingLambda:type: functionproperties:packaging:type: stacktape-lambda-buildpackproperties:entryfilePath: lambdas/log-producer.tsmyBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800events:- type: cloudwatch-logproperties:logGroupArn: $ResourceParam('myLogProducingLambda', 'arn')
Application Load Balancer event
Triggers the job when an Application Load Balancer receives an HTTP request matching specified conditions (e.g., path, headers, method).
resources:# load balancer which routes traffic to the functionmyLoadBalancer:type: application-load-balancermyBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800events:- type: application-load-balancerproperties:# referencing load balancer defined aboveloadBalancerName: myLoadBalancerpriority: 1paths:- /invoke-my-job- /another-path
Accessing other resources
By default, AWS resources cannot communicate with each other. Access must be granted explicitly using IAM permissions. Stacktape handles most of this automatically, but for resource-to-resource communication, you need to configure permissions.
Relational Databases are an exception, as they use their own connection-string-based access control.
There are two ways to grant permissions:
Using connectTo
The connectTo property is a simplified way to grant basic access to other Stacktape-managed resources. It automatically configures the necessary IAM permissions and injects environment variables with connection details into your batch job.
resources:photosBucket:type: bucketmyBatchJob:type: batch-jobproperties:container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsresources:cpu: 2memory: 1800connectTo:# access to the bucket- photosBucket# access to AWS SES- aws:ses
Configures access to other resources in your stack and AWS services. By specifying resources here, Stacktape automatically:
- Configures IAM role permissions.
- Sets up security group rules to allow network traffic.
- Injects environment variables with connection details into the compute resource.
Environment variables are named STP_[RESOURCE_NAME]_[VARIABLE_NAME] (e.g., STP_MY_DATABASE_CONNECTION_STRING).
Using iamRoleStatements
For fine-grained control, you can provide raw IAM role statements. This allows you to define custom permissions to any AWS resource.
resources:myBatchJob:type: batch-jobproperties:resources:cpu: 2memory: 1800container:packaging:type: stacktape-image-buildpackproperties:entryfilePath: path/to/my/batch-job.tsiamRoleStatements:- Resource:- $CfResourceParam('NotificationTopic', 'Arn')Effect: AllowAction:- 'sns:Publish'cloudformationResources:NotificationTopic:Type: AWS::SNS::Topic
Default VPC connection
Certain resources, like Relational Databases, must be placed within a VPC. If your stack contains such resources, Stacktape automatically creates a default VPC and connects them to it.
Batch jobs are connected to this VPC by default, allowing them to communicate with other VPC-enabled resources without extra configuration. To learn more, see our guide on VPCs.
Referenceable parameters
The following parameters can be easily referenced using $ResourceParam directive directive.
To learn more about referencing parameters, refer to referencing parameters.
Arn of the job definition resource
- Usage:
$ResourceParam('<<resource-name>>', 'jobDefinitionArn')
Arn of the state machine controlling the execution flow of the batch job
- Usage:
$ResourceParam('<<resource-name>>', 'stateMachineArn')
Arn of the log group aggregating logs from the batch job
- Usage:
$ResourceParam('<<resource-name>>', 'logGroupArn')
Pricing
You are charged for:
- The compute instances running your batch jobs.
- A negligible amount for the Lambda functions and Step Functions that manage the job's execution.
Pricing depends on the instance type and region. You can significantly reduce costs (by up to 90%) by using spot instances.