Stacktape

Sign up



Alarms

Overview

Alarms are easy way to get keep your infrastructure in check and swiftly receive notifications when your resources get overwhelmed or faulty.

Alarms are always configured to monitor a specified metric of a specified resource type. After threshold for the metric is broken, configured action is taken.

Under the hood

Under the hood alarms are implemented as AWS Cloudwatch Alarms

Usage

Alarms can be created in 2 ways:

  • Global alarms - created through Stacktape console. These alarms are applied to all resources of a certain type managed by Stacktape.
  • In-config alarms - you can specify alarm directly on the resource in your stacktape config file.

Global Alarms

Global alarms are created through Stacktape console. By creating alarm in console, you are creating a template(blueprint) of an alarm. Only when you deploy(create/update) your stack after creating global alarm (and alarm is eligible for your stack's serviceName and stage), actual alarm is created for the eligible resources of your stack.


When configuring global alarm you can specify:

  • what type of resource and metric you wish to monitor
  • thresholds for the metric
  • automatic Slack or email notifications
  • which stacks does the alarm apply to (serviceName, stage)

Global alarms you create in console get applied only to resources in stacks that are created/updated after the global alarm was created in console. In other words, if you create global alarm, you need to use deploy command to apply alarms to resources in your stacks.

Creating Global Alarm

  1. Go to Stacktape console alarms page and click Create new alarm button

Alarm page
Alarm page

  1. Configure alarm according to your needs. In our example we are creating alarm to monitor Lambda functions' error rate. We are also limiting alarm only to prod stage (used for production environments).

Creating alarm
Creating alarm

  1. After creating alarm you can use deploy command on your stack to create actual alarms in the stack.

In-config Alarms

You can specify alarm directly in the stacktape config file as a property of a resource which should be monitored.


Alarm consists of 3 parts:

  1. Trigger - Type of the trigger specifies what metric will be monitored. Depending on the type further properties of trigger can be specified
  2. Notifications (OPTIONAL) - Specifies where to send notification when alarm is triggered
  3. Evaluation (OPTIONAL) - Configures evaluation period (interval) for the monitored metric

Copy

resources:
myLambdaFunction:
type: function
properties:
packaging:
type: stacktape-lambda-buildpack
properties:
entryfilePath: 'lambdas/js-lambda.js'
alarms:
- trigger:
type: lambda-error-rate
properties:
thresholdPercent: 5
notificationTargets:
- type: slack
properties:
conversationId: C038XXXXXX
accessToken: Secret('slack-access-token')

Lambda function with error rate alarm

Trigger types

New trigger types are being added continuously. If you have a specific wish for a trigger do not hesitate to open a Github issue.

Lambda Error Rate

  • Error rate is calculated as percentage ratio (invocations that ended with error / total invocation count) during evaluation period (1 minute by default).
LambdaErrorRateTrigger  API reference
type
Required
properties.thresholdPercent
Required
properties.comparisonOperator

Lambda Duration

  • By default trigger is fired when average(avg) execution duration during evaluation period (1 minute by default) is greater than threshold
  • You can optionally customize trigger behaviour by modifying statistic and comparisonOperator properties
LambdaDurationTrigger  API reference
type
Required
properties.thresholdMilliseconds
Required
properties.comparisonOperator
properties.statistic
Default: avg

Database Read Latency

  • By default trigger is fired when average(avg) read latency during evaluation period (1 minute by default) is greater than threshold
  • You can optionally customize trigger behaviour by modifying statistic and comparisonOperator properties
RelationalDatabaseReadLatencyTrigger  API reference
type
Required
properties.thresholdSeconds
Required
properties.comparisonOperator
properties.statistic
Default: avg

Database Write Latency

  • By default trigger is fired when average(avg) write latency during evaluation period (1 minute by default) is greater than threshold
  • You can optionally customize trigger behaviour by modifying statistic and comparisonOperator properties
RelationalDatabaseWriteLatencyTrigger  API reference
type
Required
properties.thresholdSeconds
Required
properties.comparisonOperator
properties.statistic
Default: avg

Database CPU Utilization

  • By default trigger is fired when average(avg) cpu utilization during evaluation period (1 minute by default) is greater than threshold
  • You can optionally customize trigger behaviour by modifying statistic and comparisonOperator properties
RelationalDatabaseCPUUtilizationTrigger  API reference
type
Required
properties.thresholdPercent
Required
properties.comparisonOperator
properties.statistic
Default: avg

Database Free Storage

  • By default trigger is fired when minimum(min) free storage space during evaluation period (1 minute by default) is lower than threshold
  • You can optionally customize trigger behaviour by modifying statistic and comparisonOperator properties
RelationalDatabaseFreeStorageTrigger  API reference
type
Required
properties.thresholdMB
Required
properties.comparisonOperator
properties.statistic
Default: avg

Database Free Memory

  • By default trigger is fired when average(avg) free memory during evaluation period (1 minute by default) is lower than threshold
  • You can optionally customize trigger behaviour by modifying statistic and comparisonOperator properties
RelationalDatabaseFreeMemoryTrigger  API reference
type
Required
properties.thresholdMB
Required
properties.comparisonOperator
properties.statistic
Default: avg

Database Connection Count

  • By default trigger is fired when average(avg) amount of connections during evaluation period (1 minute by default) is greater than threshold
  • You can optionally customize trigger behaviour by modifying statistic and comparisonOperator properties
RelationalDatabaseConnectionCountTrigger  API reference
type
Required
properties.thresholdCount
Required
properties.comparisonOperator
properties.statistic
Default: avg

Http Api Gateway Error Rate

  • Error rate is calculated as percentage ratio (4xx and 5xx response count / total response count) during evaluation period (1 minute by default).
HttpApiGatewayErrorRateTrigger  API reference
type
Required
properties.thresholdPercent
Required
properties.comparisonOperator

Http Api Gateway Latency

  • By default trigger is fired when average(avg) latency during evaluation period (1 minute by default) is greater than threshold
  • Latency denotes the time between when API Gateway receives a request from a client and when it returns a response to the client.
  • You can optionally customize trigger behaviour by modifying statistic and comparisonOperator properties
HttpApiGatewayLatencyTrigger  API reference
type
Required
properties.thresholdMilliseconds
Required
properties.comparisonOperator
properties.statistic
Default: avg

Application Load Balancer Error Rate

  • Error rate is calculated as percentage ratio (4xx and 5xx response count / total response count) during evaluation period (1 minute by default).
ApplicationLoadBalancerErrorRateTrigger  API reference
type
Required
properties.thresholdPercent
Required
properties.comparisonOperator

Need help? Ask a question on Discord or info@stacktape.com.