Cost Management Modules

These modules automate cost optimization by stopping non-critical resources during off-hours and providing unified infrastructure lifecycle management.

Overview

DocuStack uses a two-tier approach to cost management:

Tier	Method	Duration	Savings	Use Case
Tier 1	Lambda stop/start	~30 seconds	60-70%	Daily off-hours
Tier 2	Terrateam teardown	~10-15 minutes	100%	Weekends, extended periods

Nightly Scheduler

EventBridge-scheduled Lambda functions for automated stop/start of non-critical AWS resources.

Why This Module?

Running dev/staging environments 24/7 wastes money:

ECS tasks running overnight with no users
RDS instances idle during off-hours
EC2 instances sitting unused

This module automatically stops resources at night and restarts them in the morning.

Architecture

┌─────────────────────────────────────────────────────────────┐
│            Nightly Scheduler Architecture                    │
├─────────────────────────────────────────────────────────────┤
│   ┌──────────────────────┐                                  │
│   │ EventBridge Schedule │                                  │
│   │ (7 AM UTC daily)     │                                  │
│   └──────────┬───────────┘                                  │
│              │ trigger                                       │
│              v                                               │
│   ┌──────────────────────┐         ┌──────────────────┐    │
│   │  Stop Lambda         │────────►│  ECS Services    │    │
│   │                      │         │  (desired=0)     │    │
│   │  - Find resources    │         └──────────────────┘    │
│   │  - Save state        │                                  │
│   │  - Stop resources    │         ┌──────────────────┐    │
│   │  - Send notifications│────────►│  RDS Instances   │    │
│   └──────────────────────┘         │  (stopped)       │    │
│                                    └──────────────────┘    │
│   ┌──────────────────────┐                                  │
│   │ EventBridge Schedule │                                  │
│   │ (10 PM UTC daily)    │                                  │
│   └──────────┬───────────┘                                  │
│              │ trigger                                       │
│              v                                               │
│   ┌──────────────────────┐                                  │
│   │  Start Lambda        │                                  │
│   │                      │                                  │
│   │  - Find resources    │                                  │
│   │  - Restore state     │                                  │
│   │  - Start resources   │                                  │
│   └──────────────────────┘                                  │
└─────────────────────────────────────────────────────────────┘

Usage

module "nightly_scheduler" {
  source = "git::git@github.com:docustackapp/docustack-infrastructure-modules.git//modules/nightly-scheduler?ref=v1.0.0"

  name = "docustack-dev"

  # ECS services to stop/start
  ecs_services_to_stop = [
    {
      cluster_name = "docustack-dev"
      service_name = "temporal-dev"
    },
    {
      cluster_name = "docustack-dev"
      service_name = "slack-bot-dev"
    }
  ]

  # RDS instances to stop/start
  rds_instances_to_stop = [
    "docustack-dev-postgres"
  ]

  # Schedule (UTC times)
  schedule_stop  = "cron(0 7 ? * * *)"   # 7 AM UTC = 2 AM CT
  schedule_start = "cron(0 22 ? * * *)"  # 10 PM UTC = 5 PM CT

  lambda_source_dir = "${path.root}/../../../docustack-mono/services/lambdas/nightly-scheduler"

  # Optional: Manual trigger URLs
  enable_manual_trigger = true
}

Tag-Based Discovery

Instead of explicit lists, discover resources by tags:

module "nightly_scheduler" {
  # ... basic config ...

  enable_discovery_mode = true
  ecs_cluster_names     = ["docustack-dev"]

  manage_ecs = true
  manage_rds = true
  manage_ec2 = true

  # Skip resources with this tag
  skip_tag_key   = "NightlyTeardown"
  skip_tag_value = "skip"
}

Tag resources to exclude:

resource "aws_ecs_service" "critical" {
  tags = {
    NightlyTeardown = "skip"  # Won't be stopped
  }
}

State Preservation

The stop Lambda saves ECS desired counts before stopping:

# Before stopping
desired_count = ecs.describe_services(...)['services'][0]['desiredCount']

# Tag service with original count
ecs.tag_resource(
    resourceArn=service_arn,
    tags=[{'key': 'OriginalDesiredCount', 'value': str(desired_count)}]
)

# Stop service
ecs.update_service(desiredCount=0)

The start Lambda restores the original count.

RDS Auto-Restart Handling

AWS automatically restarts stopped RDS instances after 7 days. The module handles this:

rds_auto_restart_check_enabled = true  # Default

A daily check re-stops any RDS instances that auto-started.

Manual Triggering

# Via Lambda Function URLs
STOP_URL=$(terragrunt output -raw manual_stop_url)
curl -X POST $STOP_URL

# Via AWS CLI
aws lambda invoke --function-name docustack-dev-stop-resources /tmp/response.json

# Via Slack
/infra stop dev

Cost Savings

Example: Dev Environment

Resource	24/7 Cost	8hrs/day Cost	Savings
ECS Fargate (1 task)	~$25/month	~$8/month	68%
RDS db.t3.medium	~$60/month	~$20/month	67%
Total	~$85/month	~$28/month	67%

Annual Savings: ~$684/year per environment

Lambda Code Location

Source: docustack-mono/services/lambdas/nightly-scheduler/

Infra Orchestrator

Lambda function that orchestrates infrastructure teardown and spinup operations, routing between Tier 1 (Lambda) and Tier 2 (Terrateam).

Why This Module?

Different situations call for different approaches:

Daily off-hours: Quick stop/start (Tier 1)
Weekends: Complete teardown (Tier 2)
Cost emergencies: Immediate teardown (Tier 2)

This module provides a unified interface that routes to the appropriate tier.

Architecture

┌─────────────────────────────────────────────────────────────┐
│           Infrastructure Orchestrator Architecture           │
├─────────────────────────────────────────────────────────────┤
│   Slack Bot / API                                            │
│      │                                                       │
│      │ /infra stop dev      → Tier 1 (Lambda)               │
│      │ /infra start dev     → Tier 1 (Lambda)               │
│      │ /infra teardown dev  → Tier 2 (Terrateam)            │
│      │ /infra spinup dev    → Tier 2 (Terrateam)            │
│      v                                                       │
│   ┌──────────────────────┐                                  │
│   │ Infra Orchestrator   │                                  │
│   │ Lambda               │                                  │
│   │                      │                                  │
│   │ Routes to:           │                                  │
│   │  - Stop Lambda       │──────► ECS/RDS stop              │
│   │  - Start Lambda      │──────► ECS/RDS start             │
│   │  - Terrateam API     │──────► Full teardown/spinup      │
│   └──────────────────────┘                                  │
└─────────────────────────────────────────────────────────────┘

Usage

module "infra_orchestrator" {
  source = "git::git@github.com:docustackapp/docustack-infrastructure-modules.git//modules/infra-orchestrator?ref=v1.0.0"

  name        = "docustack-dev"
  environment = "dev"

  # Tier 1: Lambda stop/start
  stop_lambda_arn  = module.nightly_scheduler.lambda_stop_arn
  start_lambda_arn = module.nightly_scheduler.lambda_start_arn

  # Tier 2: Terrateam Cloud
  terrateam_secret_name     = "terrateam/api-token"
  slack_webhook_secret_name = "slack/infra-alerts-webhook"
  github_repo               = "docustackapp/docustack-infrastructure-live"

  lambda_source_dir = "${path.root}/../../../docustack-mono/services/lambdas/infra-orchestrator"
}

Operations

Tier 1: Stop/Start (Fast)

Duration: ~30 seconds
What happens: ECS services set to 0 tasks, RDS instances stopped

# Via Slack
/infra stop dev
/infra start dev

# Via AWS CLI
aws lambda invoke \
  --function-name docustack-dev-infra-orchestrator \
  --payload '{"action":"stop","environment":"dev"}' \
  /tmp/response.json

Tier 2: Teardown/Spinup (Complete)

Duration: ~10-15 minutes
What happens: All Terraform resources destroyed/recreated via Terrateam

# Via Slack
/infra teardown dev
/infra spinup dev

# Via AWS CLI
aws lambda invoke \
  --function-name docustack-dev-infra-orchestrator \
  --payload '{"action":"teardown","environment":"dev"}' \
  /tmp/response.json

Status Check

/infra status dev

Returns current state of ECS services, RDS instances, and EC2 instances.

Secrets Configuration

# Terrateam API token
aws secretsmanager create-secret \
  --name terrateam/api-token \
  --secret-string "ttc_your_api_token_here"

# Slack webhook
aws secretsmanager create-secret \
  --name slack/infra-alerts-webhook \
  --secret-string "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

Lambda Code Location

Source: docustack-mono/services/lambdas/infra-orchestrator/

Cost Comparison

Tier 1 vs Tier 2

Aspect	Tier 1 (Stop/Start)	Tier 2 (Teardown)
Duration	~30 seconds	~10-15 minutes
Savings	60-70%	100%
Data preserved	Yes	No (recreated)
Load balancers	Still running	Destroyed
Best for	Daily off-hours	Weekends, extended

Monthly Cost Breakdown

Dev Environment (Tier 1 - 16 hrs stopped daily):

Resource	Full Cost	With Scheduler	Savings
ECS Fargate	$25	$8	$17
RDS	$60	$20	$40
ALB	$16	$16	$0
NLB	$16	$16	$0
Total	$117	$60	$57 (49%)

Dev Environment (Tier 2 - Weekend teardown):

Resource	Full Cost	With Teardown	Savings
ECS Fargate	$25	$18	$7
RDS	$60	$43	$17
ALB	$16	$11	$5
NLB	$16	$11	$5
Total	$117	$83	$34 (29%)

Combined (Tier 1 + Tier 2):

Weekdays: Tier 1 (16 hrs stopped)
Weekends: Tier 2 (full teardown)
Total savings: ~65-70%

Best Practices

Use Tier 1 for daily operations - Faster, preserves data
Use Tier 2 for weekends - Maximum savings
Tag critical resources - Exclude from nightly scheduler
Monitor costs - Set up AWS Budgets alerts
Review skip tags - Ensure nothing critical is being stopped
Test spinup - Verify Tier 2 spinup works before relying on it

Troubleshooting

Resources Not Stopping

# Check schedule status
aws scheduler get-schedule --name stop-resources --group-name docustack-dev

# Check Lambda logs
aws logs tail /aws/lambda/docustack-dev-stop-resources --since 1h

# Check for skip tags
aws ecs list-tags-for-resource --resource-arn $SERVICE_ARN

RDS Auto-Restarted

# Check RDS status
aws rds describe-db-instances \
  --db-instance-identifier docustack-dev-postgres \
  --query 'DBInstances[0].DBInstanceStatus'

# Check auto-restart check logs
aws logs tail /aws/lambda/docustack-dev-stop-resources --filter-pattern "auto-restart"

Terrateam Workflow Not Triggered

# Check orchestrator logs
aws logs tail /aws/lambda/docustack-dev-infra-orchestrator --since 10m

# Verify Terrateam API token
aws secretsmanager get-secret-value --secret-id terrateam/api-token

Overview​

Nightly Scheduler​

Why This Module?​

Architecture​

Usage​

Tag-Based Discovery​

State Preservation​

RDS Auto-Restart Handling​

Manual Triggering​

Cost Savings​

Lambda Code Location​

Infra Orchestrator​

Why This Module?​

Architecture​

Usage​

Operations​

Tier 1: Stop/Start (Fast)​

Tier 2: Teardown/Spinup (Complete)​

Status Check​

Secrets Configuration​

Lambda Code Location​

Cost Comparison​

Tier 1 vs Tier 2​

Monthly Cost Breakdown​

Best Practices​

Troubleshooting​

Resources Not Stopping​

RDS Auto-Restarted​

Terrateam Workflow Not Triggered​

Overview

Nightly Scheduler

Why This Module?

Architecture

Usage

Tag-Based Discovery

State Preservation

RDS Auto-Restart Handling

Manual Triggering

Cost Savings

Lambda Code Location

Infra Orchestrator

Why This Module?

Architecture

Usage

Operations

Tier 1: Stop/Start (Fast)

Tier 2: Teardown/Spinup (Complete)

Status Check

Secrets Configuration

Lambda Code Location

Cost Comparison

Tier 1 vs Tier 2

Monthly Cost Breakdown

Best Practices

Troubleshooting

Resources Not Stopping

RDS Auto-Restarted

Terrateam Workflow Not Triggered