Stacktape
Stacktape


Batch Jobs



A batch job is a compute resource designed to run a containerized task until it completes. The execution is triggered by an event, such as an HTTP request, a message in a queue, or an object uploaded to a bucket.

A key feature of batch jobs is the ability to use spot instances, which can reduce compute costs by up to 90%.

Like other Stacktape compute resources, batch jobs are serverless, meaning you don't need to manage the underlying infrastructure. Stacktape handles server provisioning, scaling, and security for you. You can also equip your batch job's environment with a GPU in addition to CPU and RAM.

Under the hood

Stacktape uses a combination of AWS services to provide a seamless experience for running containerized jobs:

  • AWS Batch: Provisions the virtual machines where your job runs and manages the execution.
  • AWS Step Functions: Manages the job's lifecycle, including retries and timeouts, using a serverless state machine.
  • AWS Lambda: A trigger function that connects the event source to the batch job and starts its execution.

The execution flow is as follows:

  1. An event from an integration (like an API Gateway) invokes the trigger function.
  2. The trigger function starts the batch job state machine.
  3. The state machine queues the job in AWS Batch.
  4. AWS Batch provisions the necessary resources (like a VM) and runs your containerized job.

When to use

Batch jobs are ideal for long-running, resource-intensive tasks like data processing, ETL pipelines, or machine learning model training.

If you're unsure which compute resource to use, this table provides a comparison of container-based resources in Stacktape:

Resource typeDescriptionUse-cases
web-servicecontinuously running container with public endpoint and URLpublic APIs, websites
private-servicecontinuously running container with private endpointprivate APIs, services
worker-servicecontinuously running container not accessible from outsidecontinuous processing
multi-container-workloadcustom multi container workload - you can customize accessibility for each containermore complex use-cases requiring customization
batch-jobsimple container job - container is destroyed after job is doneone-off/scheduled processing jobs

Advantages

  • Pay-per-use: You only pay for the compute time your job consumes.
  • Resource flexibility: The environment automatically scales to provide the CPU, memory, and GPU your job needs.
  • Time flexibility: Batch jobs can run for as long as needed.
  • Secure by default: The underlying environment is securely managed by AWS.
  • Easy integration: Can be triggered by a wide variety of event sources.

Disadvantages

  • Slow start time: After a job is triggered, it's placed in a queue and can take anywhere from a few seconds to a few minutes to start.

Basic usage

resources:
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
events:
- type: schedule
properties:
scheduleRate: cron(0 14 * * ? *) # every day at 14:00 UTC
(async () => {
const event = JSON.parse(process.env.STP_TRIGGER_EVENT_DATA);
// process the event
})();

Container

Your code for a batch job runs inside a Docker container. You can configure its properties:

BatchJobContainer  API reference
packaging
Required
environment

Image

A Docker container is a running instance of a Docker image. You can provide an image in four ways:

Environment variables

A list of environment variables to pass to the script or command.

Values can be:

environment:
- name: STATIC_ENV_VAR
value: my-env-var
- name: DYNAMICALLY_SET_ENV_VAR
value: $MyCustomDirective('input-for-my-directive')
- name: DB_HOST
value: $ResourceParam('myDatabase', 'host')
- name: DB_PASSWORD
value: $Secret('dbSecret.password')

Pre-set environment variables

Stacktape pre-sets the following environment variables for your job:

NameValue
STP_TRIGGER_EVENT_DATAContains JSON stringified event from an event integration that triggered this batch job.
STP_MAXIMUM_ATTEMPTSThe total number of attempts for this job before it is marked as failed.
STP_CURRENT_ATTEMPTThe current attempt number.

Logging

Any output from your code to stdout or stderr is captured and stored in an AWS CloudWatch log group.

You can view logs in two ways:

  • AWS CloudWatch Console: Get a direct link from the Stacktape Console or by using the stacktape stack-info command.
  • Stacktape CLI: Use the stacktape logs command to stream logs directly in your terminal.

Log storage can incur costs, so you can configure retentionDays to automatically delete old logs.

BatchJobLogging  API reference
Parent:BatchJob
disabled
retentionDays
Default: 90
logForwarding

Forwarding logs

You can forward logs to third-party services. See Log Forwarding for more details.

Computing resources

You can specify the amount of CPU, memory, and GPU for your batch job. AWS Batch selects the most cost-effective instance type that fits your job's requirements. To learn more about GPU instances, refer to the AWS Docs.

Important: AWS instances require a small amount of memory for their own management processes. If you request memory in exact powers of 2 (e.g., 8192 MB for 8 GiB), a larger instance may be provisioned than you expect.

Recommendation: To ensure efficient instance usage, consider requesting slightly less memory (e.g., 7680 MB instead of 8192 MB). This allows the job to fit on a standard 8 GiB instance without needing to scale up.

For more details, see the AWS documentation on memory management.

Specifying this will ensure the job runs on a GPU-accelerated instance. Supported families include NVIDIA A100 (for deep learning) and A10G (for graphics and ML inference). If omitted, a CPU-only instance will be used.

BatchJobResources  API reference
Parent:BatchJob
cpu
Required
memory
Required
gpu
resources:
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: batch-jobs/js-batch-job.js
resources:
cpu: 2
memory: 1800
events:
- type: schedule
properties:
scheduleRate: 'cron(0 14 * * ? *)' # every day at 14:00 UTC

Spot instances

Benefits:

  • Save up to 90% compared to on-demand pricing by using spare AWS capacity.

Important Considerations:

  • Spot Instances can be interrupted at any time. Your container will receive a SIGTERM signal and has 120 seconds to save its state and shut down gracefully.
  • Your application should be designed to be fault-tolerant. This can be achieved by implementing checkpointing or by making the job idempotent (safe to restart from the beginning).

For more information, see the AWS Spot Instance Advisor for interruption rates and best practices.

resources:
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
useSpotInstances: true

Retries

If a job fails (e.g., non-zero exit code, timeout, Spot Instance interruption), it can be automatically retried.

BatchJobRetryConfiguration  API reference
Parent:BatchJob
attempts
Default: 1
retryIntervalSeconds
retryIntervalMultiplier
Default: 1

Timeout

If the job exceeds this timeout, it will be stopped. If retries are configured, the job will be re-run.

resources:
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
timeout: 1200

Storage

Each batch job has its own ephemeral storage with a fixed size of 20GB. This storage is temporary and is deleted after the job completes or fails. To store data permanently, use Buckets.

Trigger events

Batch jobs are invoked in response to events from various integrations. A single job can have multiple triggers. The data payload from the trigger is available in the STP_TRIGGER_EVENT_DATA environment variable as a JSON string.

Be cautious when configuring event integrations. A high volume of events can trigger a large number of batch jobs, leading to unexpected costs. For example, 1000 HTTP requests to a connected API Gateway will result in 1000 job invocations.


HTTP Api event

Triggers the job in response to a request to a specified HTTP API Gateway. Routes are matched based on the most specific path. For more details, see the AWS Docs.

resources:
myHttpApi:
type: http-api-gateway
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
events:
- type: http-api-gateway
properties:
httpApiGatewayName: myHttpApi
path: /hello
method: GET

Lambda function connected to an HTTP API Gateway "myHttpApi"

HttpApiIntegration  API reference
Parent:BatchJob
type
Required
properties.httpApiGatewayName
Required
properties.method
Required
properties.path
Required
properties.authorizer
properties.payloadFormat
Default: '1.0'

Cognito authorizer

Restricts access to users authenticated with a User Pool. The request must include an access token. If authorized, the job receives user claims in its payload.

resources:
myGateway:
type: http-api-gateway
myUserPool:
type: user-auth-pool
properties:
userVerificationType: email-code
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
events:
- type: http-api-gateway
properties:
httpApiGatewayName: myGateway
path: /some-path
method: '*'
authorizer:
type: cognito
properties:
userPoolName: myUserPool

Example cognito authorizer

import { CognitoIdentityProvider } from '@aws-sdk/client-cognito-identity-provider';
const cognito = new CognitoIdentityProvider({});
(async () => {
const event = JSON.parse(process.env.STP_TRIGGER_EVENT_DATA);
const userData = await cognito.getUser({ AccessToken: event.headers.authorization });
// do something with your user data
})();

Example lambda batch job that fetches user data from Cognito

CognitoAuthorizer  API reference
type
Required
properties.userPoolName
Required
properties.identitySources

Lambda authorizer

Uses a dedicated Lambda function to decide if a request is authorized. The authorizer function returns a policy document or a simple boolean response. You can configure identitySources to specify which parts of the request are used for authorization. To learn more, see the AWS Docs.

LambdaAuthorizer  API reference
type
Required
properties.functionName
Required
properties.iamResponse
properties.identitySources
properties.cacheResultSeconds

Schedule event

Triggers the job on a defined schedule using either a fixed rate (e.g., every 5 minutes) or a cron expression.

resources:
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
events:
# invoke function every two hours
- type: schedule
properties:
scheduleRate: rate(2 hours)
# invoke function at 10:00 UTC every day
- type: schedule
properties:
scheduleRate: cron(0 10 * * ? *)
ScheduleIntegration  API reference
Parent:BatchJob
type
Required
properties.scheduleRate
Required
properties.input
properties.inputPath
properties.inputTransformer

Event Bus event

Triggers the job when a matching event is received by a specified event bus. You can use the default AWS event bus or a custom event bus.

resources:
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
events:
- type: event-bus
properties:
useDefaultBus: true
eventPattern:
source:
- 'aws.autoscaling'
region:
- 'us-west-2'

Batch job connected to the default event bus

resources:
myEventBus:
type: event-bus
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
events:
- type: event-bus
properties:
eventBusName: myEventBus
eventPattern:
source:
- 'mycustomsource'

Batch job connected to a custom event bus

EventBusIntegration  API reference
Parent:BatchJob
type
Required
properties.eventPattern
Required
properties.eventBusArn
properties.eventBusName
properties.useDefaultBus
properties.onDeliveryFailure
properties.input
properties.inputPath
properties.inputTransformer

SNS event

Triggers the job when a message is published to an SNS topic.

resources:
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
events:
- type: sns
properties:
topicName: mySnsTopic
mySnsTopic:
type: sns-topic
SnsIntegration  API reference
Parent:BatchJob
type
Required
properties.snsTopicName
properties.snsTopicArn
properties.filterPolicy
properties.onDeliveryFailure

SQS event

Triggers the job when messages are available in an SQS queue. Messages are processed in batches. If the job fails to start, messages return to the queue after the visibility timeout. If the job starts but then fails, the messages are considered processed.

A single queue should be consumed by a single compute resource. If you need a fan-out pattern, consider using an SNS or EventBus integration.

resources:
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
events:
- type: sqs
properties:
sqsQueueName: mySqsQueue
mySqsQueue:
type: sqs-queue
SqsIntegration  API reference
Parent:BatchJob
type
Required
properties.sqsQueueName
properties.sqsQueueArn
properties.batchSize
Default: 10
properties.maxBatchWindowSeconds

Kinesis event

Triggers the job when records are available in a Kinesis Data Stream. It's similar to SQS but designed for real-time data streaming. You can add a Kinesis stream using CloudFormation resources.

resources:
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
events:
- type: kinesis-stream
properties:
autoCreateConsumer: true
maxBatchWindowSeconds: 30
batchSize: 200
streamArn: $CfResourceParam('myKinesisStream', 'Arn')
onFailure:
arn: $CfResourceParam('myOnFailureSqsQueue', 'Arn')
type: sqs
cloudformationResources:
myKinesisStream:
Type: AWS::Kinesis::Stream
Properties:
ShardCount: 1
myOnFailureSqsQueue:
Type: AWS::SQS::Queue
KinesisIntegration  API reference
Parent:BatchJob
type
Required
properties.streamArn
Required
properties.consumerArn
properties.autoCreateConsumer
properties.maxBatchWindowSeconds
properties.batchSize
Default: 10
properties.startingPosition
Default: TRIM_HORIZON
properties.maximumRetryAttempts
properties.onFailure
properties.parallelizationFactor
properties.bisectBatchOnFunctionError

DynamoDB event

Triggers the job in response to item-level changes in a DynamoDB table. You must enable DynamoDB Streams on your table.

resources:
myDynamoDbTable:
type: dynamo-db-table
properties:
primaryKey:
partitionKey:
name: id
type: string
streamType: NEW_AND_OLD_IMAGES
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
events:
- type: dynamo-db-stream
properties:
streamArn: $ResourceParam('myDynamoDbTable', 'streamArn')
batchSize: 200
DynamoDbIntegration  API reference
Parent:BatchJob
type
Required
properties.streamArn
Required
properties.maxBatchWindowSeconds
properties.batchSize
Default: 100
properties.startingPosition
Default: TRIM_HORIZON
properties.maximumRetryAttempts
properties.onFailure
properties.parallelizationFactor
properties.bisectBatchOnFunctionError

S3 event

Triggers the job when a specific event (like object created) occurs in an S3 bucket.

resources:
myBucket:
type: bucket
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
events:
- type: s3
properties:
bucketArn: $ResourceParam('myBucket', 'arn')
s3EventType: 's3:ObjectCreated:*'
filterRule:
prefix: order-
suffix: .jpg
S3Integration  API reference
Parent:BatchJob
type
Required
properties.bucketArn
Required
properties.s3EventType
Required
properties.filterRule
S3FilterRule  API reference
prefix
suffix

Cloudwatch Log event

Triggers the job when a log record is added to a specified CloudWatch log group. The event payload is BASE64 encoded and GZIP compressed.

resources:
myLogProducingLambda:
type: function
properties:
packaging:
type: stacktape-lambda-buildpack
properties:
entryfilePath: lambdas/log-producer.ts
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
events:
- type: cloudwatch-log
properties:
logGroupArn: $ResourceParam('myLogProducingLambda', 'arn')
CloudwatchLogIntegration  API reference
Parent:BatchJob
type
Required
properties.logGroupArn
Required
properties.filter

Application Load Balancer event

Triggers the job when an Application Load Balancer receives an HTTP request matching specified conditions (e.g., path, headers, method).

resources:
# load balancer which routes traffic to the function
myLoadBalancer:
type: application-load-balancer
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
events:
- type: application-load-balancer
properties:
# referencing load balancer defined above
loadBalancerName: myLoadBalancer
priority: 1
paths:
- /invoke-my-job
- /another-path
ApplicationLoadBalancerIntegration  API reference
Parent:BatchJob
type
Required
properties.loadBalancerName
Required
properties.priority
Required
properties.listenerPort
properties.paths
properties.methods
properties.hosts
properties.headers
properties.queryParams
properties.sourceIps

Accessing other resources

By default, AWS resources cannot communicate with each other. Access must be granted explicitly using IAM permissions. Stacktape handles most of this automatically, but for resource-to-resource communication, you need to configure permissions.

Relational Databases are an exception, as they use their own connection-string-based access control.

There are two ways to grant permissions:

Using connectTo

The connectTo property is a simplified way to grant basic access to other Stacktape-managed resources. It automatically configures the necessary IAM permissions and injects environment variables with connection details into your batch job.

resources:
photosBucket:
type: bucket
myBatchJob:
type: batch-job
properties:
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
resources:
cpu: 2
memory: 1800
connectTo:
# access to the bucket
- photosBucket
# access to AWS SES
- aws:ses

Configures access to other resources in your stack and AWS services. By specifying resources here, Stacktape automatically:

  • Configures IAM role permissions.
  • Sets up security group rules to allow network traffic.
  • Injects environment variables with connection details into the compute resource.

Environment variables are named STP_[RESOURCE_NAME]_[VARIABLE_NAME] (e.g., STP_MY_DATABASE_CONNECTION_STRING).

Using iamRoleStatements

For fine-grained control, you can provide raw IAM role statements. This allows you to define custom permissions to any AWS resource.

resources:
myBatchJob:
type: batch-job
properties:
resources:
cpu: 2
memory: 1800
container:
packaging:
type: stacktape-image-buildpack
properties:
entryfilePath: path/to/my/batch-job.ts
iamRoleStatements:
- Resource:
- $CfResourceParam('NotificationTopic', 'Arn')
Effect: Allow
Action:
- 'sns:Publish'
cloudformationResources:
NotificationTopic:
Type: AWS::SNS::Topic

Default VPC connection

Certain resources, like Relational Databases, must be placed within a VPC. If your stack contains such resources, Stacktape automatically creates a default VPC and connects them to it.

Batch jobs are connected to this VPC by default, allowing them to communicate with other VPC-enabled resources without extra configuration. To learn more, see our guide on VPCs.

Referenceable parameters

The following parameters can be easily referenced using $ResourceParam directive directive.

To learn more about referencing parameters, refer to referencing parameters.

jobDefinitionArn
  • Arn of the job definition resource

  • Usage: $ResourceParam('<<resource-name>>', 'jobDefinitionArn')
stateMachineArn
  • Arn of the state machine controlling the execution flow of the batch job

  • Usage: $ResourceParam('<<resource-name>>', 'stateMachineArn')
logGroupArn
  • Arn of the log group aggregating logs from the batch job

  • Usage: $ResourceParam('<<resource-name>>', 'logGroupArn')

Pricing

You are charged for:

  • The compute instances running your batch jobs.
  • A negligible amount for the Lambda functions and Step Functions that manage the job's execution.

Pricing depends on the instance type and region. You can significantly reduce costs (by up to 90%) by using spot instances.

API reference

BatchJob  API reference
type
Required
properties.container
Required
properties.resources
Required
properties.timeout
properties.useSpotInstances
properties.logging
properties.retryConfig
properties.events
properties.connectTo
properties.iamRoleStatements
overrides
CognitoAuthorizer  API reference
type
Required
properties.userPoolName
Required
properties.identitySources
LambdaAuthorizer  API reference
type
Required
properties.functionName
Required
properties.iamResponse
properties.identitySources
properties.cacheResultSeconds
EventInputTransformer  API reference
Parent:EventBusIntegrationorScheduleIntegration
inputTemplate
Required
inputPathsMap
EventBusIntegrationPattern  API reference
version
detail-type
source
account
region
resources
detail
replay-name
SnsOnDeliveryFailure  API reference
sqsQueueArn
sqsQueueName
DestinationOnFailure  API reference
arn
Required
type
Required
S3FilterRule  API reference
prefix
suffix
LbHeaderCondition  API reference
headerName
Required
values
Required
LbQueryParamCondition  API reference
paramName
Required
values
Required
KinesisIntegration  API reference
Parent:BatchJob
type
Required
properties.streamArn
Required
properties.consumerArn
properties.autoCreateConsumer
properties.maxBatchWindowSeconds
properties.batchSize
Default: 10
properties.startingPosition
Default: TRIM_HORIZON
properties.maximumRetryAttempts
properties.onFailure
properties.parallelizationFactor
properties.bisectBatchOnFunctionError
EnvironmentVar  API reference
name
Required
value
Required
StpIamRoleStatement  API reference
Parent:BatchJob
Resource
Required
Sid
Effect
Default: Allow
Action
Condition

Contents