Nightly Scheduler
Lambda functions for automatically stopping and starting infrastructure resources during off-hours to reduce costs.
Why This Exists
Development and staging environments don't need to run 24/7. Without automation:
- Engineers forget to stop resources
- Dev environments run all night and weekend
- Monthly AWS bills are 3-4x higher than necessary
The nightly scheduler solves this by automatically stopping resources at 2 AM CT and starting them at 5 PM CT, saving 60-70% on compute costs.
How It Works
EventBridge Scheduler
|
+-- 7 AM UTC (Stop) --> stop_resources Lambda
| |
| +-- Discovery Mode?
| | |
| | +-- Yes: Query AWS APIs for resources
| | +-- No: Use explicit resource lists
| |
| +-- For each resource:
| |
| +-- Check NightlyTeardown tag
| +-- Skip if tag = "skip"
| +-- Otherwise: Stop resource
|
+-- 10 PM UTC (Start) --> start_resources Lambda
|
+-- Same discovery/skip logic
+-- Start resources without skip tag
Functions
| Function | Schedule | Action |
|---|---|---|
stop_resources.py | 7 AM UTC (2 AM CT) | Sets ECS desired count to 0, stops RDS/EC2 |
start_resources.py | 10 PM UTC (5 PM CT) | Restores ECS count from tag, starts RDS/EC2 |
Discovery Mode vs Explicit Mode
| Mode | When to Use | Configuration |
|---|---|---|
| Discovery | Dynamic environments, many resources | Set ENABLE_DISCOVERY=true and ECS_CLUSTER_NAMES |
| Explicit | Static environments, specific resources | Set ECS_SERVICES, RDS_INSTANCES, EC2_INSTANCE_IDS |
Discovery mode is recommended - it automatically finds new resources without configuration changes.
Excluding Resources
Tag resources with NightlyTeardown=skip to exclude them from the scheduler:
AWS CLI Examples
# Tag an ECS Service
aws ecs tag-resource \
--resource-arn arn:aws:ecs:us-east-1:123456789012:service/my-cluster/my-service \
--tags key=NightlyTeardown,value=skip
# Tag an RDS Instance
aws rds add-tags-to-resource \
--resource-name arn:aws:rds:us-east-1:123456789012:db:my-database \
--tags Key=NightlyTeardown,Value=skip
# Tag an EC2 Instance
aws ec2 create-tags \
--resources i-1234567890abcdef0 \
--tags Key=NightlyTeardown,Value=skip
Terraform Example
resource "aws_ecs_service" "critical_service" {
name = "critical-service"
# ... other configuration ...
tags = {
NightlyTeardown = "skip"
}
}
resource "aws_db_instance" "production" {
identifier = "production-db"
# ... other configuration ...
tags = {
NightlyTeardown = "skip"
}
}
What to Tag
| Resource Type | Tag with skip? |
|---|---|
| Production resources | Always |
| Critical dev/staging resources | If needed 24/7 |
| Bastion hosts | If needed for emergency access |
| Shared infrastructure | If used across environments |
Environment Variables
| Variable | Description | Default |
|---|---|---|
ENABLE_DISCOVERY | Enable auto-discovery of resources | false |
MANAGE_ECS | Enable management of ECS services | true |
MANAGE_RDS | Enable management of RDS instances | true |
MANAGE_EC2 | Enable management of EC2 instances | false |
ECS_CLUSTER_NAMES | JSON array of cluster names (discovery mode) | [] |
ECS_SERVICES | JSON array of {cluster_name, service_name} (explicit mode) | [] |
RDS_INSTANCES | JSON array of RDS instance identifiers (explicit mode) | [] |
EC2_INSTANCE_IDS | JSON array of EC2 instance IDs (explicit mode) | [] |
SKIP_TAG_KEY | Tag key to check for skipping | NightlyTeardown |
SKIP_TAG_VALUE | Tag value that indicates skip | skip |
SNS_TOPIC_ARN | Optional SNS topic for email notifications | "" |
SLACK_WEBHOOK_SECRET_NAME | Secrets Manager secret for Slack webhook | "" |
Notifications
Slack (Recommended)
inputs = {
slack_webhook_secret_name = "docustack/slack-webhook"
}
Store your Slack webhook URL in Secrets Manager, and the Lambda will send rich notifications on start/stop completion.
SNS (Email)
inputs = {
enable_sns_notifications = true
sns_notification_emails = ["ops@example.com"]
}
Deployment
Discovery Mode (Recommended)
module "nightly_scheduler" {
source = "../../modules/nightly-scheduler"
name = "docustack-dev"
enable_discovery = true
ecs_cluster_names = ["docustack-dev", "docustack-staging"]
manage_ecs = true
manage_rds = true
manage_ec2 = true
}
Explicit Mode
module "nightly_scheduler" {
source = "../../modules/nightly-scheduler"
name = "docustack-dev"
ecs_services_to_stop = [
{
cluster_name = "docustack-dev"
service_name = "api"
}
]
rds_instances_to_stop = ["docustack-dev-db"]
}
Development Workflow
Local Testing
cd docustack-mono/services/lambdas/nightly-scheduler
# Install dependencies
pip install boto3
# Set environment variables for discovery mode
export ENABLE_DISCOVERY=true
export ECS_CLUSTER_NAMES='["my-cluster"]'
export MANAGE_ECS=true
export MANAGE_RDS=true
export LOG_LEVEL=DEBUG
# Test locally (requires AWS credentials)
python -c "from stop_resources import lambda_handler; lambda_handler({}, None)"
Testing in AWS
# Invoke stop function
aws lambda invoke \
--function-name docustack-dev-nightly-stop \
--region us-east-1 \
/tmp/response.json
cat /tmp/response.json
# Check logs
aws logs tail /aws/lambda/docustack-dev-nightly-stop --since 10m
IAM Permissions
Discovery mode requires broader permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:ListServices",
"ecs:DescribeServices",
"ecs:UpdateService",
"ecs:ListTagsForResource",
"ecs:TagResource"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"rds:DescribeDBInstances",
"rds:StopDBInstance",
"rds:StartDBInstance",
"rds:ListTagsForResource"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:StopInstances",
"ec2:StartInstances",
"ec2:DescribeTags"
],
"Resource": "*"
}
]
}
Troubleshooting
Resources not stopping
- Check CloudWatch logs for errors
- Verify Lambda has correct IAM permissions
- Check if resource has
NightlyTeardown=skiptag - Verify
ENABLE_DISCOVERY=trueif using discovery mode
Resources not starting
- Check if original desired count was stored in tag
- Verify RDS instance is in
stoppedstate (notstopping) - Check for AWS service limits
Notifications not sending
- Verify Slack webhook URL in Secrets Manager
- Check Lambda has
secretsmanager:GetSecretValuepermission - Review CloudWatch logs for notification errors
Code Location
docustack-mono/services/lambdas/nightly-scheduler/
├── stop_resources.py # Stop handler
├── start_resources.py # Start handler
└── README.md