Tax Practice AI - Cloud Deployment Plan
Version: 1.0
Created: 2024-12-27
Status: Planning
Table of Contents
- Executive Summary
- Local Development Guarantee
- Current State Assessment
- Target Architecture
- Backlog Items for Deployment
- Infrastructure as Code Strategy
- Phase 1: Foundation Infrastructure
- Phase 2: Database and Storage
- Phase 3: Compute and Networking
- Phase 4: Frontend Deployment
- Phase 5: Orchestration and Background Jobs
- Phase 6: External Integrations
- Testing Requirements
- Best Practices Checklist
- Security Considerations
- Cloud Service Security Configurations
- Monitoring and Observability
- Disaster Recovery
- Cost Estimates
- Rollback Strategy
1. Executive Summary
This document outlines the plan to deploy Tax Practice AI to AWS cloud infrastructure. The deployment will use Infrastructure as Code (IaC) via Terraform to ensure reproducibility, version control, and automated provisioning.
Key Decisions
| Decision |
Choice |
Rationale |
| IaC Tool |
Terraform |
Multi-cloud capable, mature ecosystem, state management |
| Container Orchestration |
ECS Fargate |
Serverless containers, no EC2 management, cost-effective for our scale |
| Database |
Aurora PostgreSQL Serverless v2 |
Auto-scaling, Aurora-compatible, cost-effective for variable loads |
| Frontend Hosting |
CloudFront + S3 |
Global CDN, low latency, cost-effective for static assets |
| Orchestration |
Self-hosted Airflow on EC2 |
Full control, low cost (~$23/mo), Python DAGs |
| Secrets |
AWS Secrets Manager |
Native integration, automatic rotation |
| Monitoring |
CloudWatch + Sentry |
AWS-native metrics, application error tracking |
Deployment Environments
| Environment |
Purpose |
Database |
Domain |
| Development |
Feature development |
Local PostgreSQL (Docker) |
localhost |
| Staging |
Pre-production testing |
Aurora Serverless v2 (separate cluster) |
staging.taxpractice.ai |
| Production |
Live system |
Aurora Serverless v2 (dedicated cluster) |
app.taxpractice.ai |
2. Local Development Guarantee
Core Principle
Local development MUST remain fully functional. Cloud deployment is additive - it does NOT replace or remove local development capabilities.
What Stays in Place
| Component |
Local Tool |
Cloud Equivalent |
Guarantee |
| Database |
PostgreSQL 15 (Docker) |
Aurora PostgreSQL |
docker-compose.yml preserved and maintained |
| Object Storage |
LocalStack S3 |
AWS S3 |
LocalStack container continues to work |
| Secrets |
.env file |
Secrets Manager |
.env.example always current |
| AI/LLM |
Anthropic API (direct) |
AWS Bedrock |
ANTHROPIC_API_KEY continues to work |
| Frontend |
Vite dev server |
CloudFront + S3 |
npm run dev works offline |
| API |
uvicorn (local) |
ECS Fargate |
python -m uvicorn works |
Environment Switching
The existing config.yaml already supports environment-based switching:
# config.yaml - Works for BOTH local and cloud
database:
# host: localhost for local, Aurora endpoint for cloud
host: ${DB_HOST:-localhost}
# port: 5433 for local Docker, 5432 for Aurora
port: ${DB_PORT:-5433}
aws:
s3:
# endpoint_url: LocalStack for local, empty for real AWS
endpoint_url: ${S3_ENDPOINT_URL:-}
Local Development Commands (Unchanged)
# Start local services (PostgreSQL + LocalStack)
docker compose up -d
# Run API locally
python -m uvicorn src.api.main:app --reload --port 8000
# Run frontend locally
cd frontend && pnpm dev
# Run tests locally
pytest tests/
# Run E2E tests with LocalStack
S3_ENDPOINT_URL=http://localhost:4566 pytest tests/e2e/
Verification Checklist
Before any cloud deployment PR is merged, verify:
Why This Matters
- Developer productivity: No cloud credentials needed to write code
- Cost control: No cloud charges during development
- Offline capability: Can develop without internet
- Fast iteration: No deployment delays for testing changes
- CI/CD reliability: Tests run against local services (fast, deterministic)
3. Current State Assessment
What Exists
| Component |
Status |
Notes |
| Python Backend (FastAPI) |
Complete |
1,522 tests passing |
| React Frontend (2 apps) |
Complete |
Staff App + Client Portal |
| Local Development |
Complete |
docker-compose.yml with PostgreSQL + LocalStack |
| CI/CD Pipeline |
Complete |
GitHub Actions (lint, unit, integration, E2E) |
| Configuration |
Complete |
config.yaml with env var substitution |
What's Missing for Cloud Deployment
| Component |
Status |
Priority |
| Terraform IaC |
Not Started |
P0 |
| Production Dockerfiles |
Not Started |
P0 |
| ECS Task Definitions |
Not Started |
P0 |
| VPC and Networking |
Not Started |
P0 |
| Aurora RDS Setup |
Not Started |
P0 |
| S3 Buckets (with policies) |
Not Started |
P0 |
| CloudFront Distributions |
Not Started |
P0 |
| WAF Configuration |
Not Started |
P1 |
| Secrets Manager Setup |
Not Started |
P0 |
| CloudWatch Dashboards |
Not Started |
P1 |
| Route 53 DNS |
Not Started |
P0 |
| ACM Certificates |
Not Started |
P0 |
| IAM Roles/Policies |
Not Started |
P0 |
| Database Migration Scripts |
Not Started |
P0 |
| CD Pipeline (Deploy) |
Not Started |
P0 |
3. Target Architecture
┌─────────────────────────────────────────────────────────────────────────────────┐
│ TAX PRACTICE AI - AWS ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────────┤
│ │
│ INTERNET │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ ROUTE 53 (DNS) │ │
│ │ portal.taxpractice.ai │ app.taxpractice.ai │ api.taxpractice.ai │ │
│ └─────────────┬────────────────────┬────────────────────┬──────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ AWS WAF │ │
│ │ (Rate limiting, SQL injection, XSS protection) │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ CLOUDFRONT (CDN) │ │ CLOUDFRONT (CDN) │ │ APPLICATION LB │ │
│ │ Client Portal │ │ Staff App │ │ (HTTPS/443) │ │
│ └──────────┬──────────┘ └──────────┬──────────┘ └──────────┬──────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ S3 BUCKET │ │ S3 BUCKET │ │ ECS FARGATE │ │
│ │ (Static Assets) │ │ (Static Assets) │ │ (FastAPI) │ │
│ │ client-portal/* │ │ staff-app/* │ │ 2-4 tasks │ │
│ └─────────────────────┘ └─────────────────────┘ └─────────┬───────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ VPC (10.0.0.0/16) │ │
│ │ ┌──────────────────────────────────────────────────────────────────┐ │ │
│ │ │ PRIVATE SUBNETS │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │
│ │ │ │ ECS Fargate │ │ EC2 Airflow │ │ Lambda │ │ │ │
│ │ │ │ (API) │ │ (t3.medium) │ │ Functions │ │ │ │
│ │ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │
│ │ │ │ │ │ │ │ │
│ │ │ └────────────────┼────────────────┘ │ │ │
│ │ │ │ │ │ │
│ │ │ ▼ │ │ │
│ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ DATA LAYER │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │
│ │ │ │ │ Aurora │ │ S3 Documents │ │ │ │ │
│ │ │ │ │ PostgreSQL │ │ (KMS Encrypted) │ │ │ │ │
│ │ │ │ │ Serverless v2 │ │ │ │ │ │ │
│ │ │ │ └─────────────────┘ └─────────────────┘ │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │
│ │ │ │ │ Secrets Manager │ │ ElastiCache │ │ │ │ │
│ │ │ │ │ (Credentials) │ │ (Redis - opt) │ │ │ │ │
│ │ │ │ └─────────────────┘ └─────────────────┘ │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ └─────────────────────────────────────────────────────────┘ │ │ │
│ │ │ │ │ │
│ │ └──────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
│ │
│ EXTERNAL SERVICES │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Anthropic │ │ Stripe │ │ Persona │ │ SmartVault │ │ SurePrep │ │
│ │ (Bedrock) │ │ (Payments) │ │ (KYC) │ │ (Portal) │ │ (OCR) │ │
│ └────────────┘ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────────┘
4. Backlog Items for Deployment
P0: Deployment Blockers
These must be completed before cloud deployment:
| ID |
Item |
Status |
Description |
| DEPLOY-001 |
Terraform Foundation |
Not Started |
VPC, subnets, security groups |
| DEPLOY-002 |
Aurora RDS Module |
Not Started |
Database cluster with encryption |
| DEPLOY-003 |
S3 Module |
Not Started |
Document buckets with policies |
| DEPLOY-004 |
ECS Fargate Module |
Not Started |
Container orchestration |
| DEPLOY-005 |
CloudFront Module |
Not Started |
CDN for frontends |
| DEPLOY-006 |
Backend Dockerfile |
Not Started |
Production container image |
| DEPLOY-007 |
Frontend Build Pipeline |
Not Started |
Build and deploy to S3 |
| DEPLOY-008 |
Database Migration |
Not Started |
Schema deployment strategy |
| DEPLOY-009 |
Secrets Manager Setup |
Not Started |
All credentials in Secrets Manager |
| DEPLOY-010 |
CD Pipeline |
Not Started |
GitHub Actions deploy workflow |
P1: Production Readiness (from backlog.md)
These production blockers from TD-006 must be addressed:
| Phase |
Services |
Status |
Notes |
| Phase 1 |
EmailService + SMSService |
Not Started |
Requires API credentials |
| Phase 2 |
PersonaService |
Not Started |
Requires API credentials |
| Phase 3 |
SmartVaultService |
Not Started |
Requires API credentials |
| Phase 4 |
SurePrepService |
Not Started |
Requires API credentials |
| Phase 5 |
GoogleService |
Not Started |
Requires API credentials |
| Phase 6 |
Webhook Security |
Not Started |
HMAC verification |
P2: Operational Readiness
| ID |
Item |
Status |
Description |
| TD-004 |
UAT Scripts |
Not Started |
User acceptance testing |
| TD-001 |
Java Build Config |
Not Started |
Maven/Gradle for Java components |
| OPS-001 |
CloudWatch Dashboards |
Not Started |
Monitoring dashboards |
| OPS-002 |
Alerting Rules |
Not Started |
PagerDuty/SNS integration |
| OPS-003 |
Log Aggregation |
Not Started |
CloudWatch Logs Insights |
| OPS-004 |
Backup Verification |
Not Started |
Automated backup testing |
5. Infrastructure as Code Strategy
Why Terraform over alternatives:
| Factor |
Terraform |
AWS CDK |
CloudFormation |
| Multi-cloud |
Yes |
No |
No |
| State Management |
Built-in |
Via CFN |
Via CFN |
| Language |
HCL (declarative) |
TypeScript/Python |
YAML/JSON |
| Community Modules |
Extensive |
Growing |
Limited |
| Learning Curve |
Medium |
Higher |
Lower |
| Drift Detection |
Yes |
Limited |
Limited |
Decision: Terraform with AWS provider for:
- Declarative infrastructure definition
- Version-controlled state (S3 + DynamoDB locking)
- Modular, reusable components
- Community modules for common patterns
Repository Structure
infrastructure/
├── terraform/
│ ├── environments/
│ │ ├── staging/
│ │ │ ├── main.tf # Environment entry point
│ │ │ ├── variables.tf # Environment-specific vars
│ │ │ ├── terraform.tfvars # Variable values
│ │ │ └── backend.tf # S3 backend config
│ │ │
│ │ └── production/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf
│ │
│ ├── modules/
│ │ ├── vpc/ # VPC, subnets, NAT
│ │ │ ├── main.tf
│ │ │ ├── variables.tf
│ │ │ ├── outputs.tf
│ │ │ └── README.md
│ │ │
│ │ ├── aurora/ # Aurora PostgreSQL
│ │ │ ├── main.tf
│ │ │ ├── variables.tf
│ │ │ ├── outputs.tf
│ │ │ └── README.md
│ │ │
│ │ ├── ecs/ # ECS Fargate cluster
│ │ │ ├── main.tf
│ │ │ ├── variables.tf
│ │ │ ├── outputs.tf
│ │ │ └── README.md
│ │ │
│ │ ├── s3/ # S3 buckets
│ │ │ ├── main.tf
│ │ │ ├── variables.tf
│ │ │ ├── outputs.tf
│ │ │ └── README.md
│ │ │
│ │ ├── cloudfront/ # CDN distributions
│ │ │ ├── main.tf
│ │ │ ├── variables.tf
│ │ │ ├── outputs.tf
│ │ │ └── README.md
│ │ │
│ │ ├── alb/ # Application Load Balancer
│ │ │ ├── main.tf
│ │ │ ├── variables.tf
│ │ │ ├── outputs.tf
│ │ │ └── README.md
│ │ │
│ │ ├── secrets/ # Secrets Manager
│ │ │ ├── main.tf
│ │ │ ├── variables.tf
│ │ │ ├── outputs.tf
│ │ │ └── README.md
│ │ │
│ │ ├── waf/ # WAF rules
│ │ │ ├── main.tf
│ │ │ ├── variables.tf
│ │ │ ├── outputs.tf
│ │ │ └── README.md
│ │ │
│ │ ├── airflow/ # Airflow EC2 instance
│ │ │ ├── main.tf
│ │ │ ├── variables.tf
│ │ │ ├── outputs.tf
│ │ │ ├── user_data.sh # Bootstrap script
│ │ │ └── README.md
│ │ │
│ │ └── monitoring/ # CloudWatch dashboards/alarms
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── README.md
│ │
│ └── global/ # Shared resources (S3 backend, IAM)
│ ├── backend/
│ │ └── main.tf # S3 bucket + DynamoDB for state
│ └── iam/
│ └── main.tf # Service roles
│
├── docker/
│ ├── api/
│ │ ├── Dockerfile # FastAPI production image
│ │ └── .dockerignore
│ │
│ └── airflow/
│ ├── Dockerfile # Airflow image
│ └── requirements.txt
│
└── scripts/
├── deploy.sh # Deployment helper
├── init-backend.sh # Initialize Terraform backend
└── rotate-secrets.sh # Secret rotation helper
State Management
# terraform/environments/production/backend.tf
# Terraform state stored in S3 with DynamoDB locking
# State bucket created via terraform/global/backend/
terraform {
backend "s3" {
# bucket: S3 bucket for state storage
bucket = "tax-practice-terraform-state"
# key: Path within bucket for this environment's state
key = "production/terraform.tfstate"
# region: AWS region for state bucket
region = "us-east-1"
# encrypt: Enable server-side encryption
encrypt = true
# dynamodb_table: Table for state locking (prevents concurrent modifications)
dynamodb_table = "tax-practice-terraform-locks"
}
}
6. Phase 1: Foundation Infrastructure
6.1 VPC Module
# terraform/modules/vpc/main.tf
# VPC for Tax Practice AI
# Creates isolated network with public/private subnets across 3 AZs
resource "aws_vpc" "main" {
# cidr_block: IP address range for the VPC
# 10.0.0.0/16 provides 65,536 IP addresses
cidr_block = var.vpc_cidr
# enable_dns_hostnames: Required for RDS and other AWS services
enable_dns_hostnames = true
# enable_dns_support: Required for VPC DNS resolution
enable_dns_support = true
tags = {
Name = "${var.project}-${var.environment}-vpc"
Environment = var.environment
Project = var.project
ManagedBy = "terraform"
}
}
# Public subnets for ALB, NAT Gateway
# One per availability zone for high availability
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 4, count.index)
# availability_zone: Distribute across AZs for fault tolerance
availability_zone = var.availability_zones[count.index]
# map_public_ip_on_launch: Public subnets get public IPs
map_public_ip_on_launch = true
tags = {
Name = "${var.project}-${var.environment}-public-${count.index + 1}"
Environment = var.environment
Type = "public"
}
}
# Private subnets for ECS, RDS, Lambda
# No direct internet access - uses NAT Gateway
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 4, count.index + length(var.availability_zones))
availability_zone = var.availability_zones[count.index]
# map_public_ip_on_launch: Private subnets do NOT get public IPs
map_public_ip_on_launch = false
tags = {
Name = "${var.project}-${var.environment}-private-${count.index + 1}"
Environment = var.environment
Type = "private"
}
}
# Internet Gateway for public subnet internet access
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.project}-${var.environment}-igw"
Environment = var.environment
}
}
# Elastic IP for NAT Gateway (static IP for outbound traffic)
resource "aws_eip" "nat" {
domain = "vpc"
tags = {
Name = "${var.project}-${var.environment}-nat-eip"
Environment = var.environment
}
}
# NAT Gateway for private subnet outbound internet access
# Placed in public subnet, routes private subnet traffic to internet
resource "aws_nat_gateway" "main" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public[0].id
tags = {
Name = "${var.project}-${var.environment}-nat"
Environment = var.environment
}
depends_on = [aws_internet_gateway.main]
}
# Route table for public subnets
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
# route: Direct internet access via Internet Gateway
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${var.project}-${var.environment}-public-rt"
Environment = var.environment
}
}
# Route table for private subnets
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id
# route: Internet access via NAT Gateway (outbound only)
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main.id
}
tags = {
Name = "${var.project}-${var.environment}-private-rt"
Environment = var.environment
}
}
# Associate public subnets with public route table
resource "aws_route_table_association" "public" {
count = length(aws_subnet.public)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
# Associate private subnets with private route table
resource "aws_route_table_association" "private" {
count = length(aws_subnet.private)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private.id
}
6.2 Security Groups
# terraform/modules/vpc/security_groups.tf
# ALB Security Group - accepts HTTPS from internet
resource "aws_security_group" "alb" {
name = "${var.project}-${var.environment}-alb-sg"
description = "Security group for Application Load Balancer"
vpc_id = aws_vpc.main.id
# ingress: Allow HTTPS from anywhere
ingress {
description = "HTTPS from internet"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# ingress: Allow HTTP for redirect to HTTPS
ingress {
description = "HTTP for redirect"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# egress: Allow all outbound (to ECS tasks)
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project}-${var.environment}-alb-sg"
Environment = var.environment
}
}
# ECS Security Group - accepts traffic from ALB only
resource "aws_security_group" "ecs" {
name = "${var.project}-${var.environment}-ecs-sg"
description = "Security group for ECS Fargate tasks"
vpc_id = aws_vpc.main.id
# ingress: Only allow traffic from ALB
ingress {
description = "HTTP from ALB"
from_port = 8000
to_port = 8000
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
# egress: Allow all outbound (to RDS, S3, external APIs)
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project}-${var.environment}-ecs-sg"
Environment = var.environment
}
}
# RDS Security Group - accepts traffic from ECS and Airflow only
resource "aws_security_group" "rds" {
name = "${var.project}-${var.environment}-rds-sg"
description = "Security group for Aurora PostgreSQL"
vpc_id = aws_vpc.main.id
# ingress: PostgreSQL from ECS
ingress {
description = "PostgreSQL from ECS"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.ecs.id]
}
# ingress: PostgreSQL from Airflow
ingress {
description = "PostgreSQL from Airflow"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.airflow.id]
}
# egress: No outbound needed for RDS
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project}-${var.environment}-rds-sg"
Environment = var.environment
}
}
# Airflow EC2 Security Group
resource "aws_security_group" "airflow" {
name = "${var.project}-${var.environment}-airflow-sg"
description = "Security group for Airflow EC2 instance"
vpc_id = aws_vpc.main.id
# ingress: Airflow UI from VPN/office IPs only
ingress {
description = "Airflow UI"
from_port = 8080
to_port = 8080
protocol = "tcp"
cidr_blocks = var.admin_cidr_blocks
}
# ingress: SSH from bastion/VPN only
ingress {
description = "SSH"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = var.admin_cidr_blocks
}
# egress: Allow all outbound
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project}-${var.environment}-airflow-sg"
Environment = var.environment
}
}
7. Phase 2: Database and Storage
7.1 Aurora PostgreSQL Module
# terraform/modules/aurora/main.tf
# Aurora PostgreSQL Serverless v2 cluster
# Auto-scales based on load, pay-per-use
resource "aws_rds_cluster" "main" {
# cluster_identifier: Unique name for the cluster
cluster_identifier = "${var.project}-${var.environment}"
# engine: Aurora PostgreSQL compatible
engine = "aurora-postgresql"
engine_mode = "provisioned"
engine_version = "15.4"
# database_name: Default database created on launch
database_name = var.database_name
# master_username: Admin user (stored in Secrets Manager)
master_username = var.master_username
# master_password: Retrieved from Secrets Manager
master_password = var.master_password
# db_subnet_group_name: Place in private subnets
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [var.security_group_id]
# storage_encrypted: Encrypt data at rest with KMS
storage_encrypted = true
kms_key_id = var.kms_key_arn
# backup_retention_period: Keep 7 days of automated backups
backup_retention_period = 7
preferred_backup_window = "03:00-04:00"
# deletion_protection: Prevent accidental deletion
deletion_protection = var.environment == "production"
# skip_final_snapshot: Create snapshot before deletion (production)
skip_final_snapshot = var.environment != "production"
final_snapshot_identifier = var.environment == "production" ? "${var.project}-${var.environment}-final" : null
# enabled_cloudwatch_logs_exports: Export PostgreSQL logs
enabled_cloudwatch_logs_exports = ["postgresql"]
# serverlessv2_scaling_configuration: Auto-scaling capacity
serverlessv2_scaling_configuration {
# min_capacity: Minimum ACUs (0.5 ACU = ~1GB RAM)
min_capacity = var.min_capacity
# max_capacity: Maximum ACUs (scales up during peak)
max_capacity = var.max_capacity
}
tags = {
Name = "${var.project}-${var.environment}-aurora"
Environment = var.environment
Project = var.project
ManagedBy = "terraform"
}
}
# Aurora cluster instance (Serverless v2)
resource "aws_rds_cluster_instance" "main" {
count = var.instance_count
identifier = "${var.project}-${var.environment}-${count.index + 1}"
cluster_identifier = aws_rds_cluster.main.id
# instance_class: Serverless v2 instance type
instance_class = "db.serverless"
engine = aws_rds_cluster.main.engine
engine_version = aws_rds_cluster.main.engine_version
# publicly_accessible: Never expose RDS to internet
publicly_accessible = false
# performance_insights_enabled: Enable for query analysis
performance_insights_enabled = true
tags = {
Name = "${var.project}-${var.environment}-instance-${count.index + 1}"
Environment = var.environment
}
}
# DB subnet group for multi-AZ deployment
resource "aws_db_subnet_group" "main" {
name = "${var.project}-${var.environment}-db-subnet"
description = "Subnet group for Aurora PostgreSQL"
subnet_ids = var.private_subnet_ids
tags = {
Name = "${var.project}-${var.environment}-db-subnet"
Environment = var.environment
}
}
7.2 S3 Module
# terraform/modules/s3/main.tf
# Document storage bucket
# Stores tax documents, generated PDFs, signed forms
resource "aws_s3_bucket" "documents" {
bucket = "${var.project}-${var.environment}-documents"
tags = {
Name = "${var.project}-${var.environment}-documents"
Environment = var.environment
Purpose = "Tax document storage"
ManagedBy = "terraform"
}
}
# Enable versioning for document recovery
resource "aws_s3_bucket_versioning" "documents" {
bucket = aws_s3_bucket.documents.id
versioning_configuration {
# status: Enable versioning for all objects
status = "Enabled"
}
}
# Server-side encryption with KMS
resource "aws_s3_bucket_server_side_encryption_configuration" "documents" {
bucket = aws_s3_bucket.documents.id
rule {
apply_server_side_encryption_by_default {
# sse_algorithm: Use AWS KMS for encryption
sse_algorithm = "aws:kms"
kms_master_key_id = var.kms_key_arn
}
# bucket_key_enabled: Reduce KMS request costs
bucket_key_enabled = true
}
}
# Block all public access
resource "aws_s3_bucket_public_access_block" "documents" {
bucket = aws_s3_bucket.documents.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
# Lifecycle rules for cost optimization
resource "aws_s3_bucket_lifecycle_configuration" "documents" {
bucket = aws_s3_bucket.documents.id
# Rule 1: Move old documents to Glacier after 3 years
rule {
id = "archive-old-documents"
status = "Enabled"
filter {
prefix = "clients/"
}
transition {
days = 1095 # 3 years
storage_class = "GLACIER"
}
}
# Rule 2: Delete old versions after 90 days
rule {
id = "delete-old-versions"
status = "Enabled"
noncurrent_version_expiration {
noncurrent_days = 90
}
}
}
# CORS configuration for presigned URL uploads
resource "aws_s3_bucket_cors_configuration" "documents" {
bucket = aws_s3_bucket.documents.id
cors_rule {
allowed_headers = ["*"]
allowed_methods = ["GET", "PUT", "POST"]
allowed_origins = var.allowed_origins
expose_headers = ["ETag"]
max_age_seconds = 3600
}
}
# Frontend static hosting bucket (Client Portal)
resource "aws_s3_bucket" "frontend_portal" {
bucket = "${var.project}-${var.environment}-portal"
tags = {
Name = "${var.project}-${var.environment}-portal"
Environment = var.environment
Purpose = "Client Portal static assets"
}
}
# Frontend static hosting bucket (Staff App)
resource "aws_s3_bucket" "frontend_staff" {
bucket = "${var.project}-${var.environment}-staff"
tags = {
Name = "${var.project}-${var.environment}-staff"
Environment = var.environment
Purpose = "Staff App static assets"
}
}
# Block public access for frontend buckets (CloudFront only)
resource "aws_s3_bucket_public_access_block" "frontend_portal" {
bucket = aws_s3_bucket.frontend_portal.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_public_access_block" "frontend_staff" {
bucket = aws_s3_bucket.frontend_staff.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
8. Phase 3: Compute and Networking
8.1 ECS Fargate Module
# terraform/modules/ecs/main.tf
# ECS Cluster for FastAPI backend
resource "aws_ecs_cluster" "main" {
name = "${var.project}-${var.environment}"
# setting: Enable Container Insights for monitoring
setting {
name = "containerInsights"
value = "enabled"
}
tags = {
Name = "${var.project}-${var.environment}-cluster"
Environment = var.environment
}
}
# ECS Task Definition
resource "aws_ecs_task_definition" "api" {
family = "${var.project}-${var.environment}-api"
# requires_compatibilities: Fargate (serverless containers)
requires_compatibilities = ["FARGATE"]
# network_mode: awsvpc required for Fargate
network_mode = "awsvpc"
# cpu: 512 = 0.5 vCPU (scale up for production)
cpu = var.cpu
# memory: 1024 = 1GB RAM (scale up for production)
memory = var.memory
# execution_role_arn: Role for ECS to pull images, write logs
execution_role_arn = aws_iam_role.ecs_execution.arn
# task_role_arn: Role for the container to access AWS services
task_role_arn = aws_iam_role.ecs_task.arn
container_definitions = jsonencode([
{
name = "api"
image = var.container_image
# portMappings: Expose FastAPI port
portMappings = [
{
containerPort = 8000
hostPort = 8000
protocol = "tcp"
}
]
# environment: Non-sensitive configuration
environment = [
{ name = "ENVIRONMENT", value = var.environment },
{ name = "DB_HOST", value = var.db_host },
{ name = "DB_PORT", value = "5432" },
{ name = "DB_NAME", value = var.db_name },
{ name = "S3_BUCKET_DOCUMENTS", value = var.s3_bucket_documents },
{ name = "AWS_REGION", value = var.aws_region },
{ name = "LOG_LEVEL", value = var.environment == "production" ? "INFO" : "DEBUG" }
]
# secrets: Sensitive values from Secrets Manager
secrets = [
{
name = "DB_USER"
valueFrom = "${var.db_secret_arn}:username::"
},
{
name = "DB_PASSWORD"
valueFrom = "${var.db_secret_arn}:password::"
},
{
name = "JWT_SECRET"
valueFrom = var.jwt_secret_arn
}
]
# logConfiguration: Send logs to CloudWatch
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.api.name
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = "api"
}
}
# healthCheck: Container health check
healthCheck = {
command = ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"]
interval = 30
timeout = 5
retries = 3
startPeriod = 60
}
}
])
tags = {
Name = "${var.project}-${var.environment}-api-task"
Environment = var.environment
}
}
# ECS Service
resource "aws_ecs_service" "api" {
name = "${var.project}-${var.environment}-api"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.api.arn
# desired_count: Number of running tasks
desired_count = var.desired_count
# launch_type: Fargate for serverless
launch_type = "FARGATE"
# platform_version: Use latest Fargate platform
platform_version = "LATEST"
# deployment_configuration: Rolling updates
deployment_configuration {
maximum_percent = 200
minimum_healthy_percent = 100
}
# network_configuration: VPC networking
network_configuration {
subnets = var.private_subnet_ids
security_groups = [var.ecs_security_group_id]
assign_public_ip = false
}
# load_balancer: Connect to ALB target group
load_balancer {
target_group_arn = var.target_group_arn
container_name = "api"
container_port = 8000
}
# enable_execute_command: Enable ECS Exec for debugging
enable_execute_command = var.environment != "production"
tags = {
Name = "${var.project}-${var.environment}-api-service"
Environment = var.environment
}
}
# Auto-scaling for ECS service
resource "aws_appautoscaling_target" "api" {
max_capacity = var.max_capacity
min_capacity = var.min_capacity
resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.api.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
# Scale based on CPU utilization
resource "aws_appautoscaling_policy" "api_cpu" {
name = "${var.project}-${var.environment}-api-cpu"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.api.resource_id
scalable_dimension = aws_appautoscaling_target.api.scalable_dimension
service_namespace = aws_appautoscaling_target.api.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 70.0
scale_in_cooldown = 300
scale_out_cooldown = 60
}
}
# CloudWatch Log Group
resource "aws_cloudwatch_log_group" "api" {
name = "/ecs/${var.project}-${var.environment}-api"
retention_in_days = var.environment == "production" ? 90 : 30
tags = {
Name = "${var.project}-${var.environment}-api-logs"
Environment = var.environment
}
}
8.2 Application Load Balancer Module
# terraform/modules/alb/main.tf
# Application Load Balancer for API traffic
resource "aws_lb" "main" {
name = "${var.project}-${var.environment}-alb"
internal = false
load_balancer_type = "application"
security_groups = [var.alb_security_group_id]
subnets = var.public_subnet_ids
# enable_deletion_protection: Prevent accidental deletion
enable_deletion_protection = var.environment == "production"
# access_logs: Store access logs in S3
access_logs {
bucket = var.log_bucket
prefix = "alb"
enabled = true
}
tags = {
Name = "${var.project}-${var.environment}-alb"
Environment = var.environment
}
}
# HTTPS Listener (port 443)
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb.main.arn
port = 443
protocol = "HTTPS"
# ssl_policy: Use secure TLS policy
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
# certificate_arn: ACM certificate for HTTPS
certificate_arn = var.certificate_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.api.arn
}
}
# HTTP Listener (redirect to HTTPS)
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.main.arn
port = 80
protocol = "HTTP"
default_action {
type = "redirect"
redirect {
port = "443"
protocol = "HTTPS"
status_code = "HTTP_301"
}
}
}
# Target Group for ECS tasks
resource "aws_lb_target_group" "api" {
name = "${var.project}-${var.environment}-api-tg"
port = 8000
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "ip"
# health_check: Verify ECS tasks are healthy
health_check {
enabled = true
healthy_threshold = 2
unhealthy_threshold = 3
timeout = 5
interval = 30
path = "/health"
matcher = "200"
}
tags = {
Name = "${var.project}-${var.environment}-api-tg"
Environment = var.environment
}
}
8.3 Backend Dockerfile
# infrastructure/docker/api/Dockerfile
# =============================================================================
# Tax Practice AI - FastAPI Production Image
# =============================================================================
# Multi-stage build for minimal production image
# -----------------------------------------------------------------------------
# Stage 1: Builder
# -----------------------------------------------------------------------------
FROM python:3.12-slim as builder
# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir -r requirements.txt
# -----------------------------------------------------------------------------
# Stage 2: Production Image
# -----------------------------------------------------------------------------
FROM python:3.12-slim as production
# Security: Run as non-root user
RUN groupadd -r appgroup && useradd -r -g appgroup appuser
# Install runtime dependencies only
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Set working directory
WORKDIR /app
# Copy application code
COPY --chown=appuser:appgroup src/ ./src/
COPY --chown=appuser:appgroup config.yaml ./
# Environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONPATH=/app
# Switch to non-root user
USER appuser
# Expose API port
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Run FastAPI with uvicorn
CMD ["uvicorn", "src.api.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
9. Phase 4: Frontend Deployment
9.1 CloudFront Module
# terraform/modules/cloudfront/main.tf
# CloudFront distribution for Client Portal
resource "aws_cloudfront_distribution" "portal" {
enabled = true
is_ipv6_enabled = true
default_root_object = "index.html"
price_class = "PriceClass_100" # US, Canada, Europe
# aliases: Custom domain names
aliases = var.portal_domains
# origin: S3 bucket for static assets
origin {
domain_name = var.portal_bucket_regional_domain
origin_access_control_id = aws_cloudfront_origin_access_control.portal.id
origin_id = "S3-Portal"
}
# default_cache_behavior: Serve static assets
default_cache_behavior {
allowed_methods = ["GET", "HEAD", "OPTIONS"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "S3-Portal"
# forwarded_values: Cache based on headers
forwarded_values {
query_string = false
cookies {
forward = "none"
}
}
# viewer_protocol_policy: Redirect HTTP to HTTPS
viewer_protocol_policy = "redirect-to-https"
# TTL settings
min_ttl = 0
default_ttl = 86400 # 1 day
max_ttl = 31536000 # 1 year
compress = true
}
# custom_error_response: SPA routing (return index.html for 404s)
custom_error_response {
error_code = 404
response_code = 200
response_page_path = "/index.html"
}
custom_error_response {
error_code = 403
response_code = 200
response_page_path = "/index.html"
}
# restrictions: No geo restrictions
restrictions {
geo_restriction {
restriction_type = "none"
}
}
# viewer_certificate: Use ACM certificate
viewer_certificate {
acm_certificate_arn = var.certificate_arn
ssl_support_method = "sni-only"
minimum_protocol_version = "TLSv1.2_2021"
}
# web_acl_id: Attach WAF
web_acl_id = var.waf_acl_arn
tags = {
Name = "${var.project}-${var.environment}-portal-cdn"
Environment = var.environment
}
}
# Origin Access Control for S3
resource "aws_cloudfront_origin_access_control" "portal" {
name = "${var.project}-${var.environment}-portal-oac"
description = "OAC for Client Portal S3 bucket"
origin_access_control_origin_type = "s3"
signing_behavior = "always"
signing_protocol = "sigv4"
}
# Similar distribution for Staff App
resource "aws_cloudfront_distribution" "staff" {
enabled = true
is_ipv6_enabled = true
default_root_object = "index.html"
price_class = "PriceClass_100"
aliases = var.staff_domains
origin {
domain_name = var.staff_bucket_regional_domain
origin_access_control_id = aws_cloudfront_origin_access_control.staff.id
origin_id = "S3-Staff"
}
default_cache_behavior {
allowed_methods = ["GET", "HEAD", "OPTIONS"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "S3-Staff"
forwarded_values {
query_string = false
cookies {
forward = "none"
}
}
viewer_protocol_policy = "redirect-to-https"
min_ttl = 0
default_ttl = 86400
max_ttl = 31536000
compress = true
}
custom_error_response {
error_code = 404
response_code = 200
response_page_path = "/index.html"
}
custom_error_response {
error_code = 403
response_code = 200
response_page_path = "/index.html"
}
restrictions {
geo_restriction {
restriction_type = "none"
}
}
viewer_certificate {
acm_certificate_arn = var.certificate_arn
ssl_support_method = "sni-only"
minimum_protocol_version = "TLSv1.2_2021"
}
web_acl_id = var.waf_acl_arn
tags = {
Name = "${var.project}-${var.environment}-staff-cdn"
Environment = var.environment
}
}
resource "aws_cloudfront_origin_access_control" "staff" {
name = "${var.project}-${var.environment}-staff-oac"
description = "OAC for Staff App S3 bucket"
origin_access_control_origin_type = "s3"
signing_behavior = "always"
signing_protocol = "sigv4"
}
9.2 Frontend Build and Deploy (GitHub Actions)
# .github/workflows/deploy-frontend.yml
# Frontend deployment workflow
# Builds React apps and deploys to S3/CloudFront
name: Deploy Frontend
on:
push:
branches:
- main
paths:
- 'frontend/**'
workflow_dispatch:
inputs:
environment:
description: 'Deployment environment'
required: true
default: 'staging'
type: choice
options:
- staging
- production
env:
# AWS_REGION: Region for all AWS operations
AWS_REGION: us-east-1
# NODE_VERSION: Node.js version for builds
NODE_VERSION: '20'
jobs:
# ===========================================================================
# Build Frontend Applications
# ===========================================================================
build:
name: Build Frontend
runs-on: ubuntu-latest
strategy:
matrix:
app: [client-portal, staff-app]
steps:
# Checkout repository
- name: Checkout code
uses: actions/checkout@v4
# Setup Node.js
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'pnpm'
cache-dependency-path: frontend/pnpm-lock.yaml
# Install pnpm
- name: Install pnpm
run: npm install -g pnpm
# Install dependencies
- name: Install dependencies
working-directory: frontend
run: pnpm install --frozen-lockfile
# Build application
- name: Build ${{ matrix.app }}
working-directory: frontend
env:
VITE_API_URL: ${{ vars.API_URL }}
VITE_APP_NAME: ${{ matrix.app == 'client-portal' && 'Tax Practice Portal' || 'Tax Practice Staff' }}
run: pnpm --filter ${{ matrix.app }} build
# Upload build artifact
- name: Upload build artifact
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.app }}-build
path: frontend/apps/${{ matrix.app }}/dist
retention-days: 1
# ===========================================================================
# Deploy to S3 and Invalidate CloudFront
# ===========================================================================
deploy:
name: Deploy to AWS
runs-on: ubuntu-latest
needs: build
environment: ${{ github.event.inputs.environment || 'staging' }}
strategy:
matrix:
app: [client-portal, staff-app]
steps:
# Download build artifact
- name: Download build artifact
uses: actions/download-artifact@v4
with:
name: ${{ matrix.app }}-build
path: dist
# Configure AWS credentials
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
# Sync to S3
- name: Deploy to S3
run: |
aws s3 sync dist/ s3://${{ vars.S3_BUCKET_PREFIX }}-${{ matrix.app }}/ \
--delete \
--cache-control "public, max-age=31536000, immutable" \
--exclude "index.html" \
--exclude "*.json"
# Upload index.html with no-cache for SPA routing
aws s3 cp dist/index.html s3://${{ vars.S3_BUCKET_PREFIX }}-${{ matrix.app }}/index.html \
--cache-control "no-cache, no-store, must-revalidate"
# Invalidate CloudFront cache
- name: Invalidate CloudFront
run: |
aws cloudfront create-invalidation \
--distribution-id ${{ matrix.app == 'client-portal' && vars.CLOUDFRONT_PORTAL_ID || vars.CLOUDFRONT_STAFF_ID }} \
--paths "/*"
10. Phase 5: Orchestration and Background Jobs
10.1 Airflow EC2 Module
# terraform/modules/airflow/main.tf
# EC2 instance for self-hosted Apache Airflow
# Handles workflow orchestration, scheduled tasks
resource "aws_instance" "airflow" {
# ami: Amazon Linux 2023 AMI
ami = var.ami_id
# instance_type: t3.medium provides 2 vCPU, 4GB RAM
instance_type = var.instance_type
# subnet_id: Deploy in private subnet
subnet_id = var.private_subnet_id
# vpc_security_group_ids: Airflow security group
vpc_security_group_ids = [var.security_group_id]
# iam_instance_profile: Role for AWS service access
iam_instance_profile = aws_iam_instance_profile.airflow.name
# key_name: SSH key for access (optional, use Session Manager instead)
key_name = var.key_name
# root_block_device: 50GB gp3 storage
root_block_device {
volume_size = 50
volume_type = "gp3"
encrypted = true
delete_on_termination = true
}
# user_data: Bootstrap script
user_data = templatefile("${path.module}/user_data.sh", {
environment = var.environment
db_host = var.db_host
db_name = var.airflow_db_name
db_secret_arn = var.db_secret_arn
aws_region = var.aws_region
s3_bucket_dags = var.s3_bucket_dags
})
tags = {
Name = "${var.project}-${var.environment}-airflow"
Environment = var.environment
Service = "airflow"
}
}
# IAM role for Airflow EC2 instance
resource "aws_iam_role" "airflow" {
name = "${var.project}-${var.environment}-airflow-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}
]
})
tags = {
Name = "${var.project}-${var.environment}-airflow-role"
Environment = var.environment
}
}
# IAM policy for Airflow to access AWS services
resource "aws_iam_role_policy" "airflow" {
name = "${var.project}-${var.environment}-airflow-policy"
role = aws_iam_role.airflow.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
# S3 access for DAGs and document processing
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
]
Resource = [
"arn:aws:s3:::${var.s3_bucket_dags}",
"arn:aws:s3:::${var.s3_bucket_dags}/*",
"arn:aws:s3:::${var.s3_bucket_documents}",
"arn:aws:s3:::${var.s3_bucket_documents}/*"
]
},
{
# Secrets Manager for credentials
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue"
]
Resource = var.secret_arns
},
{
# Lambda invocation for task execution
Effect = "Allow"
Action = [
"lambda:InvokeFunction"
]
Resource = "arn:aws:lambda:${var.aws_region}:${var.account_id}:function:${var.project}-${var.environment}-*"
},
{
# CloudWatch Logs
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = "*"
},
{
# SSM Session Manager (for debugging)
Effect = "Allow"
Action = [
"ssmmessages:CreateControlChannel",
"ssmmessages:CreateDataChannel",
"ssmmessages:OpenControlChannel",
"ssmmessages:OpenDataChannel"
]
Resource = "*"
}
]
})
}
resource "aws_iam_instance_profile" "airflow" {
name = "${var.project}-${var.environment}-airflow-profile"
role = aws_iam_role.airflow.name
}
10.2 Airflow Bootstrap Script
#!/bin/bash
# terraform/modules/airflow/user_data.sh
# Bootstrap script for Airflow EC2 instance
set -e
# =============================================================================
# Environment variables (injected by Terraform)
# =============================================================================
ENVIRONMENT="${environment}"
DB_HOST="${db_host}"
DB_NAME="${db_name}"
DB_SECRET_ARN="${db_secret_arn}"
AWS_REGION="${aws_region}"
S3_BUCKET_DAGS="${s3_bucket_dags}"
# =============================================================================
# System Updates
# =============================================================================
echo "Updating system packages..."
yum update -y
yum install -y python3-pip postgresql15 git jq
# =============================================================================
# Install Airflow
# =============================================================================
echo "Installing Apache Airflow..."
pip3 install apache-airflow[postgres,amazon]==2.8.0
# =============================================================================
# Get Database Credentials from Secrets Manager
# =============================================================================
echo "Retrieving database credentials..."
DB_CREDS=$(aws secretsmanager get-secret-value \
--secret-id $DB_SECRET_ARN \
--region $AWS_REGION \
--query SecretString \
--output text)
DB_USER=$(echo $DB_CREDS | jq -r '.username')
DB_PASSWORD=$(echo $DB_CREDS | jq -r '.password')
# =============================================================================
# Configure Airflow
# =============================================================================
echo "Configuring Airflow..."
export AIRFLOW_HOME=/opt/airflow
mkdir -p $AIRFLOW_HOME/dags
# Initialize Airflow database
airflow db init
# Update configuration
cat > $AIRFLOW_HOME/airflow.cfg << EOF
[core]
executor = LocalExecutor
dags_folder = $AIRFLOW_HOME/dags
parallelism = 8
load_examples = False
[database]
sql_alchemy_conn = postgresql://$DB_USER:$DB_PASSWORD@$DB_HOST:5432/$DB_NAME
[webserver]
web_server_port = 8080
authenticate = True
auth_backend = airflow.contrib.auth.backends.password_auth
[scheduler]
dag_dir_list_interval = 300
[logging]
remote_logging = True
remote_log_conn_id = aws_default
remote_base_log_folder = s3://$S3_BUCKET_DAGS/logs
encrypt_s3_logs = True
EOF
# =============================================================================
# Sync DAGs from S3
# =============================================================================
echo "Syncing DAGs from S3..."
aws s3 sync s3://$S3_BUCKET_DAGS/dags/ $AIRFLOW_HOME/dags/
# =============================================================================
# Create systemd services
# =============================================================================
echo "Creating systemd services..."
# Airflow Webserver
cat > /etc/systemd/system/airflow-webserver.service << EOF
[Unit]
Description=Airflow Webserver
After=network.target
[Service]
Environment=AIRFLOW_HOME=$AIRFLOW_HOME
User=ec2-user
Group=ec2-user
Type=simple
ExecStart=/usr/local/bin/airflow webserver
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
# Airflow Scheduler
cat > /etc/systemd/system/airflow-scheduler.service << EOF
[Unit]
Description=Airflow Scheduler
After=network.target
[Service]
Environment=AIRFLOW_HOME=$AIRFLOW_HOME
User=ec2-user
Group=ec2-user
Type=simple
ExecStart=/usr/local/bin/airflow scheduler
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
# =============================================================================
# Start Services
# =============================================================================
echo "Starting Airflow services..."
chown -R ec2-user:ec2-user $AIRFLOW_HOME
systemctl daemon-reload
systemctl enable airflow-webserver airflow-scheduler
systemctl start airflow-webserver airflow-scheduler
echo "Airflow installation complete!"
11. Phase 6: External Integrations
Secrets Manager Configuration
# terraform/modules/secrets/main.tf
# =============================================================================
# Database Credentials
# =============================================================================
resource "aws_secretsmanager_secret" "db_credentials" {
name = "${var.project}/${var.environment}/db-credentials"
description = "Aurora PostgreSQL master credentials"
tags = {
Name = "${var.project}-${var.environment}-db-credentials"
Environment = var.environment
}
}
# =============================================================================
# JWT Secret
# =============================================================================
resource "aws_secretsmanager_secret" "jwt_secret" {
name = "${var.project}/${var.environment}/jwt-secret"
description = "JWT signing secret for authentication"
tags = {
Name = "${var.project}-${var.environment}-jwt-secret"
Environment = var.environment
}
}
# =============================================================================
# External Service Credentials
# =============================================================================
# Stripe API Keys
resource "aws_secretsmanager_secret" "stripe" {
name = "${var.project}/${var.environment}/stripe"
description = "Stripe API credentials"
tags = {
Name = "${var.project}-${var.environment}-stripe"
Environment = var.environment
}
}
# Persona API Key
resource "aws_secretsmanager_secret" "persona" {
name = "${var.project}/${var.environment}/persona"
description = "Persona identity verification API credentials"
tags = {
Name = "${var.project}-${var.environment}-persona"
Environment = var.environment
}
}
# SmartVault OAuth Credentials
resource "aws_secretsmanager_secret" "smartvault" {
name = "${var.project}/${var.environment}/smartvault"
description = "SmartVault OAuth credentials"
tags = {
Name = "${var.project}-${var.environment}-smartvault"
Environment = var.environment
}
}
# SurePrep API Credentials
resource "aws_secretsmanager_secret" "sureprep" {
name = "${var.project}/${var.environment}/sureprep"
description = "SurePrep API credentials"
tags = {
Name = "${var.project}-${var.environment}-sureprep"
Environment = var.environment
}
}
# Google OAuth Credentials
resource "aws_secretsmanager_secret" "google" {
name = "${var.project}/${var.environment}/google"
description = "Google Workspace OAuth credentials"
tags = {
Name = "${var.project}-${var.environment}-google"
Environment = var.environment
}
}
# Twilio Credentials
resource "aws_secretsmanager_secret" "twilio" {
name = "${var.project}/${var.environment}/twilio"
description = "Twilio SMS/Voice credentials"
tags = {
Name = "${var.project}-${var.environment}-twilio"
Environment = var.environment
}
}
# SendGrid API Key
resource "aws_secretsmanager_secret" "sendgrid" {
name = "${var.project}/${var.environment}/sendgrid"
description = "SendGrid email API key"
tags = {
Name = "${var.project}-${var.environment}-sendgrid"
Environment = var.environment
}
}
12. Testing Requirements
12.1 New Tests for Cloud Deployment
| Category |
Test |
Priority |
Status |
| Infrastructure |
|
|
|
|
Terraform plan validates |
P0 |
Not Started |
|
VPC connectivity test |
P0 |
Not Started |
|
Security group rules verify |
P0 |
Not Started |
|
RDS connectivity from ECS |
P0 |
Not Started |
|
S3 access from ECS |
P0 |
Not Started |
| Container |
|
|
|
|
Dockerfile builds successfully |
P0 |
Not Started |
|
Container health check passes |
P0 |
Not Started |
|
Container starts in < 60s |
P0 |
Not Started |
|
Container handles graceful shutdown |
P1 |
Not Started |
| API |
|
|
|
|
Health endpoint returns 200 |
P0 |
Not Started |
|
API responds under load (100 RPS) |
P1 |
Not Started |
|
Database migrations complete |
P0 |
Not Started |
|
Secrets retrieval works |
P0 |
Not Started |
| Frontend |
|
|
|
|
S3 deployment succeeds |
P0 |
Not Started |
|
CloudFront serves index.html |
P0 |
Not Started |
|
SPA routing works (404 → index.html) |
P0 |
Not Started |
|
API calls work through ALB |
P0 |
Not Started |
| Integration |
|
|
|
|
End-to-end user flow |
P0 |
Not Started |
|
Document upload to S3 |
P0 |
Not Started |
|
AI analysis via Bedrock |
P1 |
Not Started |
|
Webhook delivery |
P1 |
Not Started |
| Security |
|
|
|
|
WAF blocks SQL injection |
P0 |
Not Started |
|
WAF blocks XSS |
P0 |
Not Started |
|
Rate limiting works |
P1 |
Not Started |
|
SSL certificate valid |
P0 |
Not Started |
12.2 Load Testing Plan
# tests/load/config.yaml
# Load test configuration for Tax Practice AI
# Uses k6 or Artillery for load testing
scenarios:
# Scenario 1: Normal tax season load
- name: "Normal Load"
description: "Typical tax season traffic pattern"
duration: "10m"
vus: 50 # Virtual users
targets:
- endpoint: "/health"
method: "GET"
weight: 10
- endpoint: "/v1/clients"
method: "GET"
weight: 20
- endpoint: "/v1/documents"
method: "GET"
weight: 30
- endpoint: "/v1/documents/upload-url"
method: "POST"
weight: 20
- endpoint: "/v1/returns/{id}/ai/ask"
method: "POST"
weight: 20
thresholds:
# p95 latency under 500ms
http_req_duration: ["p(95)<500"]
# Error rate under 1%
http_req_failed: ["rate<0.01"]
# Scenario 2: Peak load (deadline day)
- name: "Peak Load"
description: "April 15th deadline traffic"
duration: "5m"
vus: 200
ramp_up: "1m"
thresholds:
# p95 latency under 1s during peak
http_req_duration: ["p(95)<1000"]
# Error rate under 5%
http_req_failed: ["rate<0.05"]
# Scenario 3: Spike test
- name: "Spike Test"
description: "Sudden traffic spike"
stages:
- duration: "1m"
target: 50
- duration: "30s"
target: 500 # Sudden spike
- duration: "1m"
target: 500
- duration: "30s"
target: 50 # Return to normal
thresholds:
# System should recover within 30s
http_req_duration: ["p(95)<2000"]
13. Best Practices Checklist
13.1 Infrastructure Best Practices
13.2 Security Best Practices
13.3 Operational Best Practices
13.4 Application Best Practices
13.5 CI/CD Best Practices
14. Security Considerations
14.1 Data Protection (Tax Compliance)
| Requirement |
Implementation |
| SSN/EIN encryption |
Field-level encryption in Aurora, KMS keys |
| Data at rest |
RDS encryption, S3 SSE-KMS, EBS encryption |
| Data in transit |
TLS 1.2+ everywhere, no HTTP |
| Access logging |
CloudTrail, VPC Flow Logs, application audit logs |
| 7-year retention |
S3 lifecycle policies, Aurora backups |
| PII masking |
Application-level masking in logs |
14.2 Network Security
┌─────────────────────────────────────────────────────────────────┐
│ NETWORK SECURITY LAYERS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Layer 1: Edge Protection │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ AWS WAF + Shield │ │
│ │ • SQL injection protection │ │
│ │ • XSS protection │ │
│ │ • Rate limiting (1000 req/5min per IP) │ │
│ │ • Geo-blocking (US-only for now) │ │
│ │ • Bot detection │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Layer 2: Load Balancer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Application Load Balancer │ │
│ │ • TLS termination (ACM certificate) │ │
│ │ • HTTP → HTTPS redirect │ │
│ │ • Health checks │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Layer 3: Application │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ ECS Fargate (Private Subnet) │ │
│ │ • Security group: ALB only │ │
│ │ • JWT authentication │ │
│ │ • RBAC authorization │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Layer 4: Data │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Aurora PostgreSQL (Private Subnet) │ │
│ │ • Security group: ECS + Airflow only │ │
│ │ • No public access │ │
│ │ • Encrypted connections (SSL required) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
14.3 WAF Rules
# terraform/modules/waf/main.tf
resource "aws_wafv2_web_acl" "main" {
name = "${var.project}-${var.environment}-waf"
description = "WAF rules for Tax Practice AI"
scope = "REGIONAL"
default_action {
allow {}
}
# Rule 1: Rate limiting
rule {
name = "RateLimitRule"
priority = 1
action {
block {}
}
statement {
rate_based_statement {
limit = 1000
aggregate_key_type = "IP"
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "RateLimitRule"
sampled_requests_enabled = true
}
}
# Rule 2: AWS Managed Rules - Common
rule {
name = "AWSManagedRulesCommonRuleSet"
priority = 2
override_action {
none {}
}
statement {
managed_rule_group_statement {
name = "AWSManagedRulesCommonRuleSet"
vendor_name = "AWS"
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "AWSManagedRulesCommonRuleSet"
sampled_requests_enabled = true
}
}
# Rule 3: AWS Managed Rules - SQL Injection
rule {
name = "AWSManagedRulesSQLiRuleSet"
priority = 3
override_action {
none {}
}
statement {
managed_rule_group_statement {
name = "AWSManagedRulesSQLiRuleSet"
vendor_name = "AWS"
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "AWSManagedRulesSQLiRuleSet"
sampled_requests_enabled = true
}
}
# Rule 4: AWS Managed Rules - Known Bad Inputs
rule {
name = "AWSManagedRulesKnownBadInputsRuleSet"
priority = 4
override_action {
none {}
}
statement {
managed_rule_group_statement {
name = "AWSManagedRulesKnownBadInputsRuleSet"
vendor_name = "AWS"
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "AWSManagedRulesKnownBadInputsRuleSet"
sampled_requests_enabled = true
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "${var.project}-${var.environment}-waf"
sampled_requests_enabled = true
}
tags = {
Name = "${var.project}-${var.environment}-waf"
Environment = var.environment
}
}
15. Cloud Service Security Configurations
This section provides specific security configurations for each AWS service used in Tax Practice AI. These are recommended settings for a tax/financial application handling sensitive data.
15.1 VPC Security Configuration
# =============================================================================
# VPC SECURITY SETTINGS
# =============================================================================
# CIDR Block: 10.0.0.0/16
# - Provides 65,536 IP addresses
# - Private enough to not conflict with common networks
# - Large enough for future growth
vpc_cidr = "10.0.0.0/16"
# Subnet Layout:
# Public Subnets (for ALB, NAT Gateway):
# - 10.0.0.0/20 (AZ-a) - 4,096 IPs
# - 10.0.16.0/20 (AZ-b) - 4,096 IPs
# - 10.0.32.0/20 (AZ-c) - 4,096 IPs
#
# Private Subnets (for ECS, RDS, Lambda):
# - 10.0.48.0/20 (AZ-a) - 4,096 IPs
# - 10.0.64.0/20 (AZ-b) - 4,096 IPs
# - 10.0.80.0/20 (AZ-c) - 4,096 IPs
# VPC Flow Logs: ENABLED
# - Captures all traffic (ACCEPT and REJECT)
# - Retention: 90 days in CloudWatch Logs
# - Used for security analysis and troubleshooting
flow_logs_enabled = true
flow_logs_retention_days = 90
# DNS Settings
enable_dns_hostnames = true # Required for RDS, ECS
enable_dns_support = true # Required for VPC DNS resolution
15.2 Security Group Rules (Explicit)
# =============================================================================
# SECURITY GROUP: ALB (Application Load Balancer)
# =============================================================================
# Purpose: Accept traffic from internet, forward to ECS
alb_security_group_rules = {
ingress = [
{
description = "HTTPS from internet"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # Allow from anywhere (WAF filters first)
},
{
description = "HTTP for redirect only"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # Redirects to HTTPS
}
]
egress = [
{
description = "To ECS tasks only"
from_port = 8000
to_port = 8000
protocol = "tcp"
security_groups = ["ecs_security_group"] # Reference, not CIDR
}
]
}
# =============================================================================
# SECURITY GROUP: ECS (API Containers)
# =============================================================================
# Purpose: Run FastAPI containers, access RDS and S3
ecs_security_group_rules = {
ingress = [
{
description = "From ALB only"
from_port = 8000
to_port = 8000
protocol = "tcp"
security_groups = ["alb_security_group"] # Only ALB can reach ECS
}
]
egress = [
{
description = "HTTPS to AWS services (S3, Secrets Manager, etc.)"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # AWS services via NAT Gateway
},
{
description = "PostgreSQL to RDS"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = ["rds_security_group"]
}
]
}
# =============================================================================
# SECURITY GROUP: RDS (Aurora PostgreSQL)
# =============================================================================
# Purpose: Database - most restrictive
rds_security_group_rules = {
ingress = [
{
description = "PostgreSQL from ECS"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = ["ecs_security_group"]
},
{
description = "PostgreSQL from Airflow"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = ["airflow_security_group"]
},
{
description = "PostgreSQL from Lambda"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = ["lambda_security_group"]
}
]
egress = [] # RDS does not need outbound access
}
# =============================================================================
# SECURITY GROUP: Airflow (EC2)
# =============================================================================
# Purpose: Workflow orchestration - restricted access
airflow_security_group_rules = {
ingress = [
{
description = "Airflow UI - Admin IPs only"
from_port = 8080
to_port = 8080
protocol = "tcp"
cidr_blocks = ["YOUR_OFFICE_IP/32", "YOUR_VPN_IP/32"] # REPLACE with actual IPs
},
{
description = "SSH - Admin IPs only (or use Session Manager instead)"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["YOUR_OFFICE_IP/32"] # REPLACE or remove if using SSM
}
]
egress = [
{
description = "HTTPS to AWS services"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
},
{
description = "PostgreSQL to RDS"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = ["rds_security_group"]
}
]
}
15.3 Aurora PostgreSQL Security Configuration
# =============================================================================
# AURORA POSTGRESQL - SECURITY SETTINGS
# =============================================================================
aurora_security_config = {
# Encryption at rest: REQUIRED
# Uses AWS KMS for encryption
storage_encrypted = true
kms_key_id = "alias/tax-practice-rds" # Customer managed key
# Encryption in transit: REQUIRED
# Force SSL connections
# Set via parameter group: rds.force_ssl = 1
# Network isolation
publicly_accessible = false # NEVER expose to internet
db_subnet_group = "private-subnets-only"
# Authentication
iam_database_authentication_enabled = true # Allow IAM auth for ECS
# Backup settings (IRS requires 7-year retention)
backup_retention_period = 7 # Days for automated backups
preferred_backup_window = "03:00-04:00" # UTC, during low traffic
# Deletion protection: ENABLED for production
deletion_protection = true # Prevent accidental deletion
# Enhanced monitoring
monitoring_interval = 60 # Seconds (0 to disable)
monitoring_role_arn = "arn:aws:iam::ACCOUNT:role/rds-monitoring-role"
# Performance Insights: ENABLED
performance_insights_enabled = true
performance_insights_retention_period = 7 # Days
# CloudWatch Logs export
enabled_cloudwatch_logs_exports = ["postgresql"]
# Auto minor version upgrade
auto_minor_version_upgrade = true
# Maintenance window
preferred_maintenance_window = "sun:04:00-sun:05:00" # UTC
}
# Parameter Group Settings
aurora_parameter_group = {
family = "aurora-postgresql15"
parameters = [
{
name = "rds.force_ssl"
value = "1" # REQUIRE SSL connections
},
{
name = "log_statement"
value = "ddl" # Log DDL statements for audit
},
{
name = "log_connections"
value = "1" # Log connection attempts
},
{
name = "log_disconnections"
value = "1" # Log disconnections
},
{
name = "password_encryption"
value = "scram-sha-256" # Strong password hashing
}
]
}
15.4 S3 Security Configuration
# =============================================================================
# S3 BUCKET - DOCUMENT STORAGE SECURITY
# =============================================================================
s3_security_config = {
# Block ALL public access: REQUIRED
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
# Versioning: ENABLED
# Allows recovery of accidentally deleted/modified documents
versioning_enabled = true
# Server-side encryption: AWS KMS
sse_algorithm = "aws:kms"
kms_master_key_id = "alias/tax-practice-s3" # Customer managed key
bucket_key_enabled = true # Reduce KMS costs
# Object lock: OPTIONAL (for compliance hold)
# Enable if you need WORM (Write Once Read Many)
object_lock_enabled = false
# Access logging: ENABLED
logging_enabled = true
logging_bucket = "tax-practice-access-logs"
logging_prefix = "s3-documents/"
# Lifecycle rules
lifecycle_rules = [
{
id = "archive-after-3-years"
enabled = true
prefix = "clients/"
transitions = [
{
days = 1095 # 3 years
storage_class = "GLACIER"
}
]
},
{
id = "delete-old-versions"
enabled = true
noncurrent_version_expiration = {
days = 90
}
},
{
id = "abort-incomplete-uploads"
enabled = true
abort_incomplete_multipart_upload = {
days_after_initiation = 7
}
}
]
}
# Bucket Policy: Restrict access to specific roles
s3_bucket_policy = {
Statement = [
{
Sid = "DenyUnencryptedUploads"
Effect = "Deny"
Principal = "*"
Action = "s3:PutObject"
Resource = "arn:aws:s3:::tax-practice-documents/*"
Condition = {
StringNotEquals = {
"s3:x-amz-server-side-encryption" = "aws:kms"
}
}
},
{
Sid = "DenyInsecureConnections"
Effect = "Deny"
Principal = "*"
Action = "s3:*"
Resource = [
"arn:aws:s3:::tax-practice-documents",
"arn:aws:s3:::tax-practice-documents/*"
]
Condition = {
Bool = {
"aws:SecureTransport" = "false"
}
}
},
{
Sid = "AllowECSTaskRole"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::ACCOUNT:role/tax-practice-ecs-task-role"
}
Action = [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
]
Resource = "arn:aws:s3:::tax-practice-documents/*"
}
]
}
# CORS: Restrict to known origins
s3_cors_rules = [
{
allowed_headers = ["*"]
allowed_methods = ["GET", "PUT", "POST"]
allowed_origins = [
"https://portal.taxpractice.ai",
"https://app.taxpractice.ai",
"https://staging.taxpractice.ai"
]
expose_headers = ["ETag"]
max_age_seconds = 3600
}
]
15.5 ECS Fargate Security Configuration
# =============================================================================
# ECS FARGATE - CONTAINER SECURITY
# =============================================================================
ecs_security_config = {
# Network mode: awsvpc (required for Fargate)
# Gives each task its own ENI and security group
network_mode = "awsvpc"
# Task networking
assign_public_ip = false # NEVER assign public IP to tasks
# Container configuration
container_config = {
# Run as non-root user
user = "1000:1000" # appuser:appgroup
# Read-only root filesystem
readonly_root_filesystem = true
# No privileged mode
privileged = false
# Resource limits (prevent runaway containers)
cpu = 512 # 0.5 vCPU
memory = 1024 # 1 GB
# Health check
health_check = {
command = ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"]
interval = 30
timeout = 5
retries = 3
start_period = 60
}
# Logging
log_configuration = {
log_driver = "awslogs"
options = {
awslogs_group = "/ecs/tax-practice-api"
awslogs_region = "us-east-1"
awslogs_stream_prefix = "api"
}
}
}
# Task execution role (for ECS to pull images, write logs)
execution_role_permissions = [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"logs:CreateLogStream",
"logs:PutLogEvents",
"secretsmanager:GetSecretValue" # For injecting secrets
]
# Task role (for the container to access AWS services)
task_role_permissions = [
{
effect = "Allow"
actions = ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"]
resources = ["arn:aws:s3:::tax-practice-documents/*"]
},
{
effect = "Allow"
actions = ["bedrock:InvokeModel"]
resources = ["arn:aws:bedrock:us-east-1::foundation-model/anthropic.*"]
},
{
effect = "Allow"
actions = ["secretsmanager:GetSecretValue"]
resources = ["arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:tax-practice/*"]
}
]
# ECS Exec for debugging (STAGING ONLY)
enable_execute_command = false # Set to true for staging
}
15.6 ALB Security Configuration
# =============================================================================
# APPLICATION LOAD BALANCER - SECURITY
# =============================================================================
alb_security_config = {
# Internal: NO (internet-facing)
internal = false
# Idle timeout
idle_timeout = 60 # Seconds
# Deletion protection: ENABLED for production
enable_deletion_protection = true
# HTTP/2: ENABLED
enable_http2 = true
# Access logs: ENABLED
access_logs = {
enabled = true
bucket = "tax-practice-access-logs"
prefix = "alb/"
}
# TLS Policy: Use most secure available
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
# Listener rules
listeners = {
https = {
port = 443
protocol = "HTTPS"
certificate_arn = "arn:aws:acm:us-east-1:ACCOUNT:certificate/xxx"
default_action = {
type = "forward"
target_group_arn = "api-target-group"
}
}
http = {
port = 80
protocol = "HTTP"
# ALWAYS redirect HTTP to HTTPS
default_action = {
type = "redirect"
redirect = {
port = "443"
protocol = "HTTPS"
status_code = "HTTP_301"
}
}
}
}
# Target group health check
health_check = {
enabled = true
healthy_threshold = 2
unhealthy_threshold = 3
timeout = 5
interval = 30
path = "/health"
matcher = "200"
}
}
15.7 CloudFront Security Configuration
# =============================================================================
# CLOUDFRONT - CDN SECURITY
# =============================================================================
cloudfront_security_config = {
# Price class: US, Canada, Europe (reduce attack surface)
price_class = "PriceClass_100"
# HTTP versions
http_version = "http2and3"
# TLS: Require TLS 1.2+
viewer_certificate = {
acm_certificate_arn = "arn:aws:acm:us-east-1:ACCOUNT:certificate/xxx"
ssl_support_method = "sni-only"
minimum_protocol_version = "TLSv1.2_2021"
}
# Viewer protocol policy: HTTPS only
viewer_protocol_policy = "redirect-to-https"
# Origin protocol policy: HTTPS to S3 (OAC)
origin_access_control = {
origin_access_control_origin_type = "s3"
signing_behavior = "always"
signing_protocol = "sigv4"
}
# Response headers policy (security headers)
response_headers_policy = {
security_headers_config = {
# Strict-Transport-Security
strict_transport_security = {
access_control_max_age_sec = 31536000 # 1 year
include_subdomains = true
preload = true
override = true
}
# Content-Type-Options
content_type_options = {
override = true
}
# Frame-Options
frame_options = {
frame_option = "DENY"
override = true
}
# XSS-Protection
xss_protection = {
mode_block = true
protection = true
override = true
}
# Content-Security-Policy
content_security_policy = {
content_security_policy = "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; img-src 'self' data: https:; font-src 'self'; connect-src 'self' https://api.taxpractice.ai"
override = true
}
# Referrer-Policy
referrer_policy = {
referrer_policy = "strict-origin-when-cross-origin"
override = true
}
}
}
# Geo restriction: US only (adjust as needed)
geo_restriction = {
restriction_type = "whitelist"
locations = ["US"] # Add more if needed
}
# WAF: ATTACHED
web_acl_id = "arn:aws:wafv2:us-east-1:ACCOUNT:regional/webacl/tax-practice-waf"
}
15.8 WAF Security Configuration
# =============================================================================
# AWS WAF - WEB APPLICATION FIREWALL
# =============================================================================
waf_security_config = {
# Scope: REGIONAL (for ALB) or CLOUDFRONT
scope = "REGIONAL"
# Default action: ALLOW (rules block specific threats)
default_action = "allow"
# Rate limiting
rate_limit_rule = {
name = "RateLimitPerIP"
priority = 1
limit = 1000 # Requests per 5-minute window per IP
action = "block"
}
# AWS Managed Rules (recommended set)
managed_rules = [
{
name = "AWSManagedRulesCommonRuleSet"
vendor_name = "AWS"
priority = 10
override_action = "none" # Use rule actions as-is
},
{
name = "AWSManagedRulesSQLiRuleSet"
vendor_name = "AWS"
priority = 20
override_action = "none"
},
{
name = "AWSManagedRulesKnownBadInputsRuleSet"
vendor_name = "AWS"
priority = 30
override_action = "none"
},
{
name = "AWSManagedRulesLinuxRuleSet"
vendor_name = "AWS"
priority = 40
override_action = "none"
},
{
name = "AWSManagedRulesAmazonIpReputationList"
vendor_name = "AWS"
priority = 50
override_action = "none"
}
]
# Custom rules
custom_rules = [
{
name = "BlockBadBots"
priority = 60
action = "block"
statement = {
byte_match_statement = {
field_to_match = {
single_header = {
name = "user-agent"
}
}
positional_constraint = "CONTAINS"
search_string = "bad-bot" # Example, add real bad bot signatures
text_transformations = [
{
priority = 0
type = "LOWERCASE"
}
]
}
}
}
]
# Logging
logging_configuration = {
log_destination_configs = ["arn:aws:logs:us-east-1:ACCOUNT:log-group:aws-waf-logs"]
redacted_fields = [
{
single_header = {
name = "authorization" # Don't log auth tokens
}
}
]
}
}
15.9 Secrets Manager Security Configuration
# =============================================================================
# AWS SECRETS MANAGER - CREDENTIALS MANAGEMENT
# =============================================================================
secrets_manager_config = {
# KMS encryption: Customer managed key
kms_key_id = "alias/tax-practice-secrets"
# Secret rotation: ENABLED for database credentials
rotation_rules = {
automatically_after_days = 30
}
# Recovery window: 7 days (can recover deleted secrets)
recovery_window_in_days = 7
# Secrets to create
secrets = {
"tax-practice/production/db-credentials" = {
description = "Aurora PostgreSQL master credentials"
secret_string = {
username = "app_user"
password = "GENERATED_BY_TERRAFORM" # Use random_password
}
}
"tax-practice/production/jwt-secret" = {
description = "JWT signing key"
# Generate with: openssl rand -base64 64
}
"tax-practice/production/stripe" = {
description = "Stripe API keys"
secret_string = {
secret_key = "sk_live_xxx"
publishable_key = "pk_live_xxx"
webhook_secret = "whsec_xxx"
}
}
"tax-practice/production/persona" = {
description = "Persona identity verification"
secret_string = {
api_key = "xxx"
template_id = "xxx"
webhook_secret = "xxx"
}
}
# Add other secrets as needed...
}
# Resource policy: Restrict access to specific roles
resource_policy = {
Statement = [
{
Sid = "AllowECSTaskRole"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::ACCOUNT:role/tax-practice-ecs-task-role"
}
Action = "secretsmanager:GetSecretValue"
Resource = "*"
Condition = {
StringEquals = {
"secretsmanager:ResourceTag/Environment" = "production"
}
}
}
]
}
}
15.10 KMS Key Configuration
# =============================================================================
# AWS KMS - ENCRYPTION KEYS
# =============================================================================
kms_keys = {
# Key for RDS encryption
"tax-practice-rds" = {
description = "KMS key for Aurora PostgreSQL encryption"
deletion_window_in_days = 30
enable_key_rotation = true # Automatic annual rotation
policy = {
Statement = [
{
Sid = "AllowRDSAccess"
Effect = "Allow"
Principal = {
Service = "rds.amazonaws.com"
}
Action = [
"kms:Encrypt",
"kms:Decrypt",
"kms:GenerateDataKey*"
]
Resource = "*"
}
]
}
}
# Key for S3 encryption
"tax-practice-s3" = {
description = "KMS key for S3 document encryption"
deletion_window_in_days = 30
enable_key_rotation = true
policy = {
Statement = [
{
Sid = "AllowS3Access"
Effect = "Allow"
Principal = {
Service = "s3.amazonaws.com"
}
Action = [
"kms:Encrypt",
"kms:Decrypt",
"kms:GenerateDataKey*"
]
Resource = "*"
}
]
}
}
# Key for Secrets Manager
"tax-practice-secrets" = {
description = "KMS key for Secrets Manager"
deletion_window_in_days = 30
enable_key_rotation = true
}
}
15.11 IAM Roles and Policies
# =============================================================================
# IAM ROLES - LEAST PRIVILEGE
# =============================================================================
iam_roles = {
# ECS Task Execution Role (for ECS to pull images, write logs)
"tax-practice-ecs-execution-role" = {
assume_role_policy = {
Service = "ecs-tasks.amazonaws.com"
}
managed_policies = [
"arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
]
inline_policies = [
{
name = "SecretsAccess"
policy = {
Statement = [
{
Effect = "Allow"
Action = "secretsmanager:GetSecretValue"
Resource = "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:tax-practice/*"
}
]
}
}
]
}
# ECS Task Role (for container to access AWS services)
"tax-practice-ecs-task-role" = {
assume_role_policy = {
Service = "ecs-tasks.amazonaws.com"
}
inline_policies = [
{
name = "S3Access"
policy = {
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
]
Resource = "arn:aws:s3:::tax-practice-documents/*"
},
{
Effect = "Allow"
Action = "s3:ListBucket"
Resource = "arn:aws:s3:::tax-practice-documents"
}
]
}
},
{
name = "BedrockAccess"
policy = {
Statement = [
{
Effect = "Allow"
Action = "bedrock:InvokeModel"
Resource = "arn:aws:bedrock:us-east-1::foundation-model/anthropic.*"
}
]
}
},
{
name = "SecretsAccess"
policy = {
Statement = [
{
Effect = "Allow"
Action = "secretsmanager:GetSecretValue"
Resource = "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:tax-practice/*"
}
]
}
}
]
}
# Airflow EC2 Role
"tax-practice-airflow-role" = {
assume_role_policy = {
Service = "ec2.amazonaws.com"
}
managed_policies = [
"arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore" # For Session Manager
]
inline_policies = [
{
name = "AirflowPermissions"
policy = {
Statement = [
{
Effect = "Allow"
Action = ["s3:GetObject", "s3:PutObject", "s3:ListBucket"]
Resource = [
"arn:aws:s3:::tax-practice-dags",
"arn:aws:s3:::tax-practice-dags/*"
]
},
{
Effect = "Allow"
Action = "secretsmanager:GetSecretValue"
Resource = "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:tax-practice/*"
},
{
Effect = "Allow"
Action = "lambda:InvokeFunction"
Resource = "arn:aws:lambda:us-east-1:ACCOUNT:function:tax-practice-*"
}
]
}
}
]
}
}
15.12 CloudTrail Configuration
# =============================================================================
# AWS CLOUDTRAIL - AUDIT LOGGING
# =============================================================================
cloudtrail_config = {
name = "tax-practice-audit-trail"
s3_bucket_name = "tax-practice-cloudtrail-logs"
include_global_service_events = true
is_multi_region_trail = true
enable_logging = true
# Log file validation (detect tampering)
enable_log_file_validation = true
# KMS encryption for logs
kms_key_id = "alias/tax-practice-cloudtrail"
# CloudWatch Logs integration
cloud_watch_logs_group_arn = "arn:aws:logs:us-east-1:ACCOUNT:log-group:cloudtrail"
cloud_watch_logs_role_arn = "arn:aws:iam::ACCOUNT:role/cloudtrail-to-cloudwatch"
# Event selectors (what to log)
event_selectors = [
{
read_write_type = "All"
include_management_events = true
data_resources = [
{
type = "AWS::S3::Object"
values = ["arn:aws:s3:::tax-practice-documents/"]
}
]
}
]
# Insights (anomaly detection)
insight_selectors = [
{
insight_type = "ApiCallRateInsight"
},
{
insight_type = "ApiErrorRateInsight"
}
]
}
15.13 Security Summary Table
| Service |
Encryption at Rest |
Encryption in Transit |
Public Access |
Logging |
| Aurora PostgreSQL |
KMS (customer key) |
TLS 1.2+ required |
NO |
CloudWatch Logs |
| S3 Documents |
SSE-KMS (customer key) |
HTTPS required |
NO (blocked) |
Access Logs |
| ECS Fargate |
N/A (stateless) |
TLS to ALB |
NO (private subnet) |
CloudWatch Logs |
| Secrets Manager |
KMS (customer key) |
TLS |
N/A |
CloudTrail |
| ALB |
N/A |
TLS 1.2+ (ACM cert) |
YES (internet-facing) |
Access Logs |
| CloudFront |
N/A |
TLS 1.2+ |
YES (CDN) |
Access Logs |
| Airflow EC2 |
EBS encryption |
TLS for AWS APIs |
NO (private subnet) |
CloudWatch Logs |
16. Monitoring and Observability
16.1 CloudWatch Dashboard
# terraform/modules/monitoring/dashboard.tf
resource "aws_cloudwatch_dashboard" "main" {
dashboard_name = "${var.project}-${var.environment}"
dashboard_body = jsonencode({
widgets = [
# Row 1: ECS Metrics
{
type = "metric"
x = 0
y = 0
width = 8
height = 6
properties = {
title = "ECS CPU Utilization"
region = var.aws_region
metrics = [
["AWS/ECS", "CPUUtilization", "ServiceName", "${var.project}-${var.environment}-api", "ClusterName", "${var.project}-${var.environment}"]
]
period = 300
stat = "Average"
}
},
{
type = "metric"
x = 8
y = 0
width = 8
height = 6
properties = {
title = "ECS Memory Utilization"
region = var.aws_region
metrics = [
["AWS/ECS", "MemoryUtilization", "ServiceName", "${var.project}-${var.environment}-api", "ClusterName", "${var.project}-${var.environment}"]
]
period = 300
stat = "Average"
}
},
{
type = "metric"
x = 16
y = 0
width = 8
height = 6
properties = {
title = "Running Task Count"
region = var.aws_region
metrics = [
["ECS/ContainerInsights", "RunningTaskCount", "ServiceName", "${var.project}-${var.environment}-api", "ClusterName", "${var.project}-${var.environment}"]
]
period = 60
stat = "Average"
}
},
# Row 2: ALB Metrics
{
type = "metric"
x = 0
y = 6
width = 8
height = 6
properties = {
title = "Request Count"
region = var.aws_region
metrics = [
["AWS/ApplicationELB", "RequestCount", "LoadBalancer", var.alb_arn_suffix]
]
period = 60
stat = "Sum"
}
},
{
type = "metric"
x = 8
y = 6
width = 8
height = 6
properties = {
title = "Response Time (p95)"
region = var.aws_region
metrics = [
["AWS/ApplicationELB", "TargetResponseTime", "LoadBalancer", var.alb_arn_suffix]
]
period = 60
stat = "p95"
}
},
{
type = "metric"
x = 16
y = 6
width = 8
height = 6
properties = {
title = "HTTP 5xx Errors"
region = var.aws_region
metrics = [
["AWS/ApplicationELB", "HTTPCode_Target_5XX_Count", "LoadBalancer", var.alb_arn_suffix]
]
period = 60
stat = "Sum"
}
},
# Row 3: Database Metrics
{
type = "metric"
x = 0
y = 12
width = 8
height = 6
properties = {
title = "Aurora CPU Utilization"
region = var.aws_region
metrics = [
["AWS/RDS", "CPUUtilization", "DBClusterIdentifier", "${var.project}-${var.environment}"]
]
period = 300
stat = "Average"
}
},
{
type = "metric"
x = 8
y = 12
width = 8
height = 6
properties = {
title = "Aurora Connections"
region = var.aws_region
metrics = [
["AWS/RDS", "DatabaseConnections", "DBClusterIdentifier", "${var.project}-${var.environment}"]
]
period = 60
stat = "Average"
}
},
{
type = "metric"
x = 16
y = 12
width = 8
height = 6
properties = {
title = "Aurora Serverless ACU"
region = var.aws_region
metrics = [
["AWS/RDS", "ServerlessDatabaseCapacity", "DBClusterIdentifier", "${var.project}-${var.environment}"]
]
period = 60
stat = "Average"
}
}
]
})
}
16.2 Alarms
# terraform/modules/monitoring/alarms.tf
# High CPU Alarm
resource "aws_cloudwatch_metric_alarm" "ecs_cpu_high" {
alarm_name = "${var.project}-${var.environment}-ecs-cpu-high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/ECS"
period = 300
statistic = "Average"
threshold = 80
alarm_description = "ECS CPU utilization above 80%"
dimensions = {
ClusterName = "${var.project}-${var.environment}"
ServiceName = "${var.project}-${var.environment}-api"
}
alarm_actions = [var.sns_topic_arn]
ok_actions = [var.sns_topic_arn]
tags = {
Name = "${var.project}-${var.environment}-ecs-cpu-high"
Environment = var.environment
}
}
# High Error Rate Alarm
resource "aws_cloudwatch_metric_alarm" "alb_5xx_high" {
alarm_name = "${var.project}-${var.environment}-alb-5xx-high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "HTTPCode_Target_5XX_Count"
namespace = "AWS/ApplicationELB"
period = 300
statistic = "Sum"
threshold = 10
alarm_description = "High 5xx error rate from API"
dimensions = {
LoadBalancer = var.alb_arn_suffix
}
alarm_actions = [var.sns_topic_arn]
ok_actions = [var.sns_topic_arn]
tags = {
Name = "${var.project}-${var.environment}-alb-5xx-high"
Environment = var.environment
}
}
# Database Connection Alarm
resource "aws_cloudwatch_metric_alarm" "rds_connections_high" {
alarm_name = "${var.project}-${var.environment}-rds-connections-high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "DatabaseConnections"
namespace = "AWS/RDS"
period = 300
statistic = "Average"
threshold = 200 # 80% of typical max
alarm_description = "High database connection count"
dimensions = {
DBClusterIdentifier = "${var.project}-${var.environment}"
}
alarm_actions = [var.sns_topic_arn]
ok_actions = [var.sns_topic_arn]
tags = {
Name = "${var.project}-${var.environment}-rds-connections-high"
Environment = var.environment
}
}
17. Disaster Recovery
17.1 Backup Strategy
| Component |
Backup Method |
Frequency |
Retention |
RTO |
RPO |
| Aurora |
Automated snapshots |
Daily |
7 days |
1 hour |
5 min |
| Aurora |
Point-in-time recovery |
Continuous |
7 days |
15 min |
5 min |
| S3 Documents |
Versioning + Cross-region |
Real-time |
7 years |
1 hour |
0 |
| Terraform State |
S3 versioning |
Real-time |
90 days |
5 min |
0 |
| Application Logs |
CloudWatch Logs |
Real-time |
90 days |
N/A |
N/A |
17.2 Recovery Procedures
| Scenario |
Recovery Steps |
Expected Time |
| ECS Task Failure |
Auto-replaced by service |
< 2 min |
| Availability Zone Failure |
Traffic shifts to healthy AZ |
< 5 min |
| Database Corruption |
Point-in-time recovery |
15-30 min |
| Complete Region Failure |
Manual failover to DR region |
1-4 hours |
| Accidental Data Deletion |
S3 versioning / DB restore |
15-60 min |
18. Cost Estimates
18.1 Monthly Cost Breakdown
| Service |
Configuration |
Est. Monthly Cost |
| ECS Fargate |
2 tasks x 0.5vCPU x 1GB |
$30-50 |
| Aurora Serverless v2 |
0.5-4 ACU, 50GB storage |
$80-150 |
| Application Load Balancer |
1 ALB + data transfer |
$20-30 |
| CloudFront |
2 distributions, ~100GB/mo |
$10-20 |
| S3 |
100GB documents |
$5-10 |
| NAT Gateway |
1 gateway + data transfer |
$35-50 |
| Route 53 |
2 hosted zones + queries |
$2-5 |
| Secrets Manager |
10 secrets |
$5 |
| CloudWatch |
Logs, metrics, alarms |
$10-20 |
| WAF |
Web ACL + rules |
$10-15 |
| EC2 (Airflow) |
t3.medium reserved |
$23 |
| ACM Certificates |
Free |
$0 |
| KMS |
Keys + requests |
$5-10 |
| Total (Staging) |
|
~$150-200 |
| Total (Production) |
Higher capacity |
~$250-400 |
18.2 Cost Optimization Strategies
- Reserved Instances: 1-year commitment for EC2 (Airflow) saves 30%
- Aurora Serverless: Auto-scales down during off-hours
- S3 Intelligent-Tiering: Auto-moves cold data to cheaper storage
- CloudFront caching: Reduces origin requests
- Right-sizing: Monitor and adjust ECS task sizes
19. Rollback Strategy
19.1 Application Rollback
# Rollback procedure for ECS deployment
# Option 1: ECS Console/CLI rollback to previous task definition
aws ecs update-service \
--cluster tax-practice-production \
--service tax-practice-production-api \
--task-definition tax-practice-production-api:PREVIOUS_VERSION \
--force-new-deployment
# Option 2: Revert code and redeploy
git revert HEAD
git push origin main # Triggers CI/CD
19.2 Infrastructure Rollback
# Rollback Terraform changes
# Option 1: Revert to previous state
terraform workspace select production
cd infrastructure/terraform/environments/production
# Show what will change
terraform plan -target=module.affected_module
# Apply previous configuration from git
git checkout HEAD~1 -- .
terraform apply
# Option 2: Import and fix manually
terraform import aws_resource.name resource_id
19.3 Database Rollback
-- Option 1: Revert migration
-- Each migration has a corresponding down migration
-- Option 2: Point-in-time recovery
-- Use AWS Console or CLI to restore to specific time
aws rds restore-db-cluster-to-point-in-time \
--source-db-cluster-identifier tax-practice-production \
--db-cluster-identifier tax-practice-production-restored \
--restore-to-time 2024-12-27T10:00:00Z
Next Steps
- Review this plan with stakeholders
- Create Terraform backend (S3 bucket + DynamoDB table)
- Implement Phase 1 (VPC, security groups)
- Set up staging environment first
- Production deployment after staging validation
Document History
| Version |
Date |
Author |
Changes |
| 1.0 |
2024-12-27 |
Don McCarty |
Initial cloud deployment plan |