Skip to main content

Nightly Scheduler

Lambda functions for automatically stopping and starting infrastructure resources during off-hours to reduce costs.

Why This Exists

Development and staging environments don't need to run 24/7. Without automation:

  • Engineers forget to stop resources
  • Dev environments run all night and weekend
  • Monthly AWS bills are 3-4x higher than necessary

The nightly scheduler solves this by automatically stopping resources at 2 AM CT and starting them at 5 PM CT, saving 60-70% on compute costs.

How It Works

EventBridge Scheduler
|
+-- 7 AM UTC (Stop) --> stop_resources Lambda
| |
| +-- Discovery Mode?
| | |
| | +-- Yes: Query AWS APIs for resources
| | +-- No: Use explicit resource lists
| |
| +-- For each resource:
| |
| +-- Check NightlyTeardown tag
| +-- Skip if tag = "skip"
| +-- Otherwise: Stop resource
|
+-- 10 PM UTC (Start) --> start_resources Lambda
|
+-- Same discovery/skip logic
+-- Start resources without skip tag

Functions

FunctionScheduleAction
stop_resources.py7 AM UTC (2 AM CT)Sets ECS desired count to 0, stops RDS/EC2
start_resources.py10 PM UTC (5 PM CT)Restores ECS count from tag, starts RDS/EC2

Discovery Mode vs Explicit Mode

ModeWhen to UseConfiguration
DiscoveryDynamic environments, many resourcesSet ENABLE_DISCOVERY=true and ECS_CLUSTER_NAMES
ExplicitStatic environments, specific resourcesSet ECS_SERVICES, RDS_INSTANCES, EC2_INSTANCE_IDS

Discovery mode is recommended - it automatically finds new resources without configuration changes.

Excluding Resources

Tag resources with NightlyTeardown=skip to exclude them from the scheduler:

AWS CLI Examples

# Tag an ECS Service
aws ecs tag-resource \
--resource-arn arn:aws:ecs:us-east-1:123456789012:service/my-cluster/my-service \
--tags key=NightlyTeardown,value=skip

# Tag an RDS Instance
aws rds add-tags-to-resource \
--resource-name arn:aws:rds:us-east-1:123456789012:db:my-database \
--tags Key=NightlyTeardown,Value=skip

# Tag an EC2 Instance
aws ec2 create-tags \
--resources i-1234567890abcdef0 \
--tags Key=NightlyTeardown,Value=skip

Terraform Example

resource "aws_ecs_service" "critical_service" {
name = "critical-service"
# ... other configuration ...

tags = {
NightlyTeardown = "skip"
}
}

resource "aws_db_instance" "production" {
identifier = "production-db"
# ... other configuration ...

tags = {
NightlyTeardown = "skip"
}
}

What to Tag

Resource TypeTag with skip?
Production resourcesAlways
Critical dev/staging resourcesIf needed 24/7
Bastion hostsIf needed for emergency access
Shared infrastructureIf used across environments

Environment Variables

VariableDescriptionDefault
ENABLE_DISCOVERYEnable auto-discovery of resourcesfalse
MANAGE_ECSEnable management of ECS servicestrue
MANAGE_RDSEnable management of RDS instancestrue
MANAGE_EC2Enable management of EC2 instancesfalse
ECS_CLUSTER_NAMESJSON array of cluster names (discovery mode)[]
ECS_SERVICESJSON array of {cluster_name, service_name} (explicit mode)[]
RDS_INSTANCESJSON array of RDS instance identifiers (explicit mode)[]
EC2_INSTANCE_IDSJSON array of EC2 instance IDs (explicit mode)[]
SKIP_TAG_KEYTag key to check for skippingNightlyTeardown
SKIP_TAG_VALUETag value that indicates skipskip
SNS_TOPIC_ARNOptional SNS topic for email notifications""
SLACK_WEBHOOK_SECRET_NAMESecrets Manager secret for Slack webhook""

Notifications

inputs = {
slack_webhook_secret_name = "docustack/slack-webhook"
}

Store your Slack webhook URL in Secrets Manager, and the Lambda will send rich notifications on start/stop completion.

SNS (Email)

inputs = {
enable_sns_notifications = true
sns_notification_emails = ["ops@example.com"]
}

Deployment

module "nightly_scheduler" {
source = "../../modules/nightly-scheduler"

name = "docustack-dev"

enable_discovery = true
ecs_cluster_names = ["docustack-dev", "docustack-staging"]
manage_ecs = true
manage_rds = true
manage_ec2 = true
}

Explicit Mode

module "nightly_scheduler" {
source = "../../modules/nightly-scheduler"

name = "docustack-dev"

ecs_services_to_stop = [
{
cluster_name = "docustack-dev"
service_name = "api"
}
]

rds_instances_to_stop = ["docustack-dev-db"]
}

Development Workflow

Local Testing

cd docustack-mono/services/lambdas/nightly-scheduler

# Install dependencies
pip install boto3

# Set environment variables for discovery mode
export ENABLE_DISCOVERY=true
export ECS_CLUSTER_NAMES='["my-cluster"]'
export MANAGE_ECS=true
export MANAGE_RDS=true
export LOG_LEVEL=DEBUG

# Test locally (requires AWS credentials)
python -c "from stop_resources import lambda_handler; lambda_handler({}, None)"

Testing in AWS

# Invoke stop function
aws lambda invoke \
--function-name docustack-dev-nightly-stop \
--region us-east-1 \
/tmp/response.json

cat /tmp/response.json

# Check logs
aws logs tail /aws/lambda/docustack-dev-nightly-stop --since 10m

IAM Permissions

Discovery mode requires broader permissions:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:ListServices",
"ecs:DescribeServices",
"ecs:UpdateService",
"ecs:ListTagsForResource",
"ecs:TagResource"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"rds:DescribeDBInstances",
"rds:StopDBInstance",
"rds:StartDBInstance",
"rds:ListTagsForResource"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:StopInstances",
"ec2:StartInstances",
"ec2:DescribeTags"
],
"Resource": "*"
}
]
}

Troubleshooting

Resources not stopping

  1. Check CloudWatch logs for errors
  2. Verify Lambda has correct IAM permissions
  3. Check if resource has NightlyTeardown=skip tag
  4. Verify ENABLE_DISCOVERY=true if using discovery mode

Resources not starting

  1. Check if original desired count was stored in tag
  2. Verify RDS instance is in stopped state (not stopping)
  3. Check for AWS service limits

Notifications not sending

  1. Verify Slack webhook URL in Secrets Manager
  2. Check Lambda has secretsmanager:GetSecretValue permission
  3. Review CloudWatch logs for notification errors

Code Location

docustack-mono/services/lambdas/nightly-scheduler/
├── stop_resources.py # Stop handler
├── start_resources.py # Start handler
└── README.md