Terraform Fundamentals

Project: Provision a Web Stack with Terraform

35 min Lesson 10 of 30

Project: Provision a Web Stack with Terraform

This capstone lesson turns every concept from the tutorial — HCL syntax, providers, variables, state, remote backends, data sources, meta-arguments, and modules — into a single, end-to-end production-grade project. You will provision a three-tier web stack on AWS: a custom VPC with public and private subnets across multiple availability zones, an Auto Scaling Group of EC2 instances behind an Application Load Balancer, and remote state stored in S3 with DynamoDB locking. This is the pattern used by platform-engineering teams at companies like Stripe, Shopify, and Airbnb for their foundational cloud workloads.

Project Directory Structure

Organize the project as a root module that calls two reusable child modules: modules/network for VPC and subnets, and modules/compute for the load balancer, Auto Scaling Group, and security groups. Remote state is bootstrapped separately — you never let Terraform manage the S3 bucket and DynamoDB table that hold its own state file.

web-stack/ ├── backend-bootstrap/ # One-time: creates the S3 bucket + DynamoDB table │ └── main.tf ├── modules/ │ ├── network/ │ │ ├── main.tf # VPC, subnets, IGW, NAT GW, route tables │ │ ├── variables.tf │ │ └── outputs.tf │ └── compute/ │ ├── main.tf # ALB, ASG, launch template, security groups │ ├── variables.tf │ └── outputs.tf ├── main.tf # Root: calls network + compute modules ├── variables.tf ├── outputs.tf ├── locals.tf ├── versions.tf # Required providers + version constraints ├── backend.tf # S3 remote backend config └── terraform.tfvars # Non-sensitive variable values
Bootstrap vs. managed state: The backend-bootstrap/ directory is a tiny, separate Terraform workspace that uses local state and is run exactly once per environment. It creates the S3 bucket (with versioning and server-side encryption) and the DynamoDB table. Because those resources hold your main stack's state, they must never be managed by the main stack — a destroy would wipe your state file and leave you with an unrecoverable blast radius.

Step 1 — Bootstrap Remote State

Before the main stack can use a remote backend, the backend infrastructure must exist. This is a one-time operation per environment. In CI pipelines at large organizations this step is gated behind a separate "bootstrap" pipeline that requires SRE approval to run.

# backend-bootstrap/main.tf terraform { required_providers { aws = { source = "hashicorp/aws", version = "~> 5.0" } } } provider "aws" { region = "us-east-1" } resource "aws_s3_bucket" "tf_state" { bucket = "acme-terraform-state-prod" lifecycle { prevent_destroy = true # Guard against accidental wipe } tags = { ManagedBy = "terraform-bootstrap", Purpose = "terraform-state" } } resource "aws_s3_bucket_versioning" "tf_state" { bucket = aws_s3_bucket.tf_state.id versioning_configuration { status = "Enabled" } } resource "aws_s3_bucket_server_side_encryption_configuration" "tf_state" { bucket = aws_s3_bucket.tf_state.id rule { apply_server_side_encryption_by_default { sse_algorithm = "aws:kms" } } } resource "aws_s3_bucket_public_access_block" "tf_state" { bucket = aws_s3_bucket.tf_state.id block_public_acls = true block_public_policy = true ignore_public_acls = true restrict_public_buckets = true } resource "aws_dynamodb_table" "tf_lock" { name = "acme-terraform-lock" billing_mode = "PAY_PER_REQUEST" hash_key = "LockID" attribute { name = "LockID" type = "S" } tags = { ManagedBy = "terraform-bootstrap", Purpose = "terraform-lock" } } # Run once: # cd backend-bootstrap && terraform init && terraform apply

Step 2 — Root Module: Versions, Backend, and Locals

The versions.tf file pins every provider to a minor version range using the pessimistic constraint operator (~>). Unpinned providers are one of the most common sources of surprise infrastructure drift in shared repositories — a terraform init -upgrade on a colleague's machine can pull a provider with a breaking change and corrupt real infrastructure if the plan is auto-applied.

# versions.tf terraform { required_version = "~> 1.8" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.50" } } } # backend.tf terraform { backend "s3" { bucket = "acme-terraform-state-prod" key = "web-stack/production/terraform.tfstate" region = "us-east-1" dynamodb_table = "acme-terraform-lock" encrypt = true kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/mrk-abc123" } } # locals.tf locals { name_prefix = "${var.environment}-${var.project_name}" common_tags = merge( { Environment = var.environment Project = var.project_name ManagedBy = "terraform" Owner = "platform-team" }, var.extra_tags ) # Compute AZ count: production gets 3, others get 2 az_count = var.environment == "production" ? 3 : 2 }

Step 3 — The Network Module (VPC + Subnets)

The network module builds a hub-and-spoke VPC: public subnets host the ALB and NAT Gateways, private subnets host the EC2 instances. Each AZ gets one public and one private subnet. Using for_each over a slice of the AZ list makes the module AZ-count-agnostic — it works for a dev stack with two AZs and a production stack with three, driven entirely by a variable.

# modules/network/main.tf data "aws_availability_zones" "available" { state = "available" } locals { azs = slice(data.aws_availability_zones.available.names, 0, var.az_count) public_cidrs = [for i, az in local.azs : cidrsubnet(var.vpc_cidr, 8, i)] private_cidrs = [for i, az in local.azs : cidrsubnet(var.vpc_cidr, 8, i + 10)] } resource "aws_vpc" "main" { cidr_block = var.vpc_cidr enable_dns_hostnames = true enable_dns_support = true tags = merge(var.tags, { Name = "${var.name_prefix}-vpc" }) } resource "aws_internet_gateway" "main" { vpc_id = aws_vpc.main.id tags = merge(var.tags, { Name = "${var.name_prefix}-igw" }) } resource "aws_subnet" "public" { for_each = { for i, az in local.azs : az => { cidr = local.public_cidrs[i], idx = i } } vpc_id = aws_vpc.main.id cidr_block = each.value.cidr availability_zone = each.key map_public_ip_on_launch = true tags = merge(var.tags, { Name = "${var.name_prefix}-public-${each.value.idx + 1}", Tier = "public" }) } resource "aws_subnet" "private" { for_each = { for i, az in local.azs : az => { cidr = local.private_cidrs[i], idx = i } } vpc_id = aws_vpc.main.id cidr_block = each.value.cidr availability_zone = each.key tags = merge(var.tags, { Name = "${var.name_prefix}-private-${each.value.idx + 1}", Tier = "private" }) } resource "aws_eip" "nat" { for_each = aws_subnet.public domain = "vpc" tags = merge(var.tags, { Name = "${var.name_prefix}-nat-eip-${each.value.availability_zone}" }) } resource "aws_nat_gateway" "main" { for_each = aws_subnet.public allocation_id = aws_eip.nat[each.key].id subnet_id = each.value.id tags = merge(var.tags, { Name = "${var.name_prefix}-nat-${each.value.availability_zone}" }) depends_on = [aws_internet_gateway.main] } resource "aws_route_table" "public" { vpc_id = aws_vpc.main.id route { cidr_block = "0.0.0.0/0" gateway_id = aws_internet_gateway.main.id } tags = merge(var.tags, { Name = "${var.name_prefix}-rt-public" }) } resource "aws_route_table_association" "public" { for_each = aws_subnet.public subnet_id = each.value.id route_table_id = aws_route_table.public.id } resource "aws_route_table" "private" { for_each = aws_subnet.private vpc_id = aws_vpc.main.id route { cidr_block = "0.0.0.0/0" nat_gateway_id = aws_nat_gateway.main[each.key].id } tags = merge(var.tags, { Name = "${var.name_prefix}-rt-private-${each.value.availability_zone}" }) } resource "aws_route_table_association" "private" { for_each = aws_subnet.private subnet_id = each.value.id route_table_id = aws_route_table.private[each.key].id }
VPC Architecture: Public and Private Subnets Across AZs VPC 10.0.0.0/16 Internet Gateway AZ — us-east-1a Public Subnet 10.0.0.0/24 NAT GW ALB Node Private Subnet 10.0.10.0/24 EC2 (ASG) SG: 443 from ALB only AZ — us-east-1b Public Subnet 10.0.1.0/24 NAT GW ALB Node Private Subnet 10.0.11.0/24 EC2 (ASG) SG: 443 from ALB only AZ — us-east-1c Public Subnet 10.0.2.0/24 NAT GW ALB Node Private Subnet 10.0.12.0/24 EC2 (ASG) SG: 443 from ALB only Public Subnet (ALB + NAT) Private Subnet (EC2 ASG) Egress: NAT Gateway Security Groups
Three-AZ VPC layout: ALB nodes and NAT Gateways live in public subnets; EC2 instances (ASG) live in private subnets and reach the internet only via NAT.

Step 4 — The Compute Module (ALB + ASG)

The compute module wires together the Application Load Balancer in the public subnets, a launch template referencing the latest Amazon Linux 2023 AMI via a data source, and an Auto Scaling Group that spans the private subnets. The security group model is explicit and minimal: the ALB accepts port 443 from the internet, and EC2 instances accept port 443 only from the ALB's security group — never from 0.0.0.0/0.

# modules/compute/main.tf (abridged — key resources shown) data "aws_ami" "al2023" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["al2023-ami-*-x86_64"] } } # --- Security Groups --- resource "aws_security_group" "alb" { name = "${var.name_prefix}-alb-sg" description = "Allow HTTPS inbound from internet, all outbound to instances." vpc_id = var.vpc_id ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] description = "HTTPS from internet" } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } tags = merge(var.tags, { Name = "${var.name_prefix}-alb-sg" }) } resource "aws_security_group" "ec2" { name = "${var.name_prefix}-ec2-sg" description = "Allow HTTPS only from the ALB security group." vpc_id = var.vpc_id ingress { from_port = 443 to_port = 443 protocol = "tcp" security_groups = [aws_security_group.alb.id] description = "HTTPS from ALB only" } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] description = "Allow outbound (NAT GW egress)" } tags = merge(var.tags, { Name = "${var.name_prefix}-ec2-sg" }) } # --- ALB --- resource "aws_lb" "main" { name = "${var.name_prefix}-alb" internal = false load_balancer_type = "application" security_groups = [aws_security_group.alb.id] subnets = values(var.public_subnet_ids) idle_timeout = 60 drop_invalid_header_fields = true # Production security requirement access_logs { bucket = var.access_log_bucket prefix = "${var.name_prefix}-alb" enabled = true } tags = merge(var.tags, { Name = "${var.name_prefix}-alb" }) } resource "aws_lb_target_group" "app" { name = "${var.name_prefix}-tg" port = 443 protocol = "HTTPS" vpc_id = var.vpc_id target_type = "instance" health_check { path = "/health" healthy_threshold = 2 unhealthy_threshold = 3 timeout = 5 interval = 30 matcher = "200" } tags = merge(var.tags, { Name = "${var.name_prefix}-tg" }) } resource "aws_lb_listener" "https" { load_balancer_arn = aws_lb.main.arn port = 443 protocol = "HTTPS" ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06" certificate_arn = var.acm_certificate_arn default_action { type = "forward" target_group_arn = aws_lb_target_group.app.arn } } # --- Launch Template + ASG --- resource "aws_launch_template" "app" { name_prefix = "${var.name_prefix}-lt-" image_id = data.aws_ami.al2023.id instance_type = var.instance_type network_interfaces { associate_public_ip_address = false security_groups = [aws_security_group.ec2.id] } iam_instance_profile { name = var.instance_profile_name } user_data = base64encode(<<-EOF #!/bin/bash dnf update -y dnf install -y nginx systemctl enable --now nginx EOF ) metadata_options { http_endpoint = "enabled" http_tokens = "required" # IMDSv2 only — security baseline http_put_response_hop_limit = 1 } lifecycle { create_before_destroy = true } tags = merge(var.tags, { Name = "${var.name_prefix}-lt" }) } resource "aws_autoscaling_group" "app" { name = "${var.name_prefix}-asg" min_size = var.asg_min max_size = var.asg_max desired_capacity = var.asg_desired vpc_zone_identifier = values(var.private_subnet_ids) health_check_type = "ELB" health_check_grace_period = 120 target_group_arns = [aws_lb_target_group.app.arn] launch_template { id = aws_launch_template.app.id version = "$Latest" } instance_refresh { strategy = "Rolling" preferences { min_healthy_percentage = 80 instance_warmup = 60 } } dynamic "tag" { for_each = merge(var.tags, { Name = "${var.name_prefix}-app" }) content { key = tag.key value = tag.value propagate_at_launch = true } } lifecycle { create_before_destroy = true } }
Production pitfall — IMDSv1 on EC2: Omitting http_tokens = "required" in the launch template leaves IMDSv1 enabled. IMDSv1 is reachable from any process on the instance, including server-side request forgery (SSRF) vulnerabilities in application code. The Capital One breach (2019) exploited IMDS to extract IAM credentials. Always enforce IMDSv2 in every launch template. AWS now defaults new accounts to IMDSv2-only, but existing accounts and AMIs may still default to IMDSv1.

Step 5 — Root Module and Deployment Workflow

The root module wires the two child modules together, passing the network outputs into the compute module's inputs. It also emits the critical outputs consumed by subsequent CI pipeline steps — the smoke test URL and the ALB ARN for DNS record creation.

# main.tf (root module) provider "aws" { region = var.aws_region default_tags { tags = local.common_tags } } module "network" { source = "./modules/network" name_prefix = local.name_prefix vpc_cidr = var.vpc_cidr az_count = local.az_count tags = local.common_tags } module "compute" { source = "./modules/compute" name_prefix = local.name_prefix vpc_id = module.network.vpc_id public_subnet_ids = module.network.public_subnet_ids private_subnet_ids = module.network.private_subnet_ids instance_type = var.instance_type acm_certificate_arn = var.acm_certificate_arn instance_profile_name = var.instance_profile_name access_log_bucket = var.access_log_bucket asg_min = var.asg_min asg_max = var.asg_max asg_desired = var.asg_desired tags = local.common_tags } # outputs.tf output "alb_dns_name" { description = "DNS name of the Application Load Balancer." value = module.compute.alb_dns_name } output "vpc_id" { description = "VPC ID." value = module.network.vpc_id } output "asg_name" { description = "Name of the Auto Scaling Group." value = module.compute.asg_name } # --- # Deployment commands (run by CI after plan is approved): # Init (downloads providers, configures S3 backend): terraform init \ -backend-config="bucket=acme-terraform-state-prod" \ -backend-config="key=web-stack/production/terraform.tfstate" \ -backend-config="region=us-east-1" # Plan (output saved as artifact for review gate): terraform plan -var-file=envs/production.tfvars -out=tfplan # Apply (uses the saved plan — no re-plan surprises): terraform apply tfplan # Post-apply smoke test: ALB=$(terraform output -raw alb_dns_name) curl -sf --retry 5 --retry-delay 10 "https://${ALB}/health" \ || { echo "Smoke test failed — rolling back"; terraform destroy -auto-approve -target=module.compute; exit 1; }
Always apply a saved plan file in CI. Running terraform apply without -out=tfplan and then terraform apply tfplan means Terraform creates a fresh plan at apply time. Between human review and apply, another pipeline or manual change could alter the state — producing an apply that does not match what was reviewed. Saving the plan with -out and applying that exact artifact is the only way to guarantee plan-review integrity. HashiCorp Terraform Cloud enforces this as a mandatory workflow feature for enterprise plans.

Production Failure Modes to Know

After running dozens of Terraform-managed rollouts you will encounter predictable failure patterns. Knowing them in advance turns a midnight incident into a ten-minute fix:

  • State lock not released after an interrupted apply: Run terraform force-unlock <LOCK_ID> — the lock ID is shown in the error. Verify the previous apply actually failed before unlocking; if it completed, the unlock is harmless. If another apply is genuinely running, never force-unlock.
  • Desired capacity drift in ASG: If an operator manually adjusts desired_capacity in the AWS console, the next terraform plan will show a diff and reset it. Use ignore_changes = [desired_capacity] in the ASG lifecycle block if you manage desired capacity through a separate auto-scaling policy.
  • NAT Gateway EIP limit: AWS default is 5 EIPs per region. A three-AZ stack needs 3 EIPs for NAT Gateways. Across multiple environments in one region you hit the limit quickly — request a quota increase as part of the initial infrastructure setup, before the first apply.
  • AMI deregistration: If the AMI used by the launch template is deregistered, new ASG instances fail to launch but existing instances are unaffected. The fix is to update the launch template's AMI reference and trigger an instance refresh. Always pin launch templates to AMIs managed via AWS Image Builder or Packer pipelines, not to public AMIs that can be removed.
  • Provider version mismatch across workspaces: A colleague runs terraform init -upgrade and commits an updated .terraform.lock.hcl that pins a new provider version. Your CI picks it up on the next run. The new provider may have a breaking schema change for a resource you use. Solution: review lock file diffs in PRs with the same scrutiny as application code changes.
Module versioning in team environments: In solo or small-team projects it is acceptable to reference modules via relative paths (./modules/network). In large organizations, modules are published to a private Terraform registry or to a Git repository with tagged releases, and callers pin to a semantic version: source = "git::https://github.com/acme/terraform-modules.git//network?ref=v2.3.0". This ensures that a module change in one team's branch does not silently break another team's infrastructure on their next init.

ES
Edrees Salih
1 hour ago

We are still cooking the magic in the way!