Terraform Fundamentals

Meta-Arguments: count, for_each & lifecycle

22 min Lesson 8 of 30

Meta-Arguments: count, for_each & lifecycle

Every resource block in Terraform, by default, manages exactly one infrastructure object. The moment you need ten security group rules, five S3 buckets, or a fleet of IAM users, repeating resource blocks is not an option — it destroys readability and makes the configuration drift-prone. Meta-arguments are special arguments recognised by Terraform itself (not the provider) that change how a resource behaves: how many copies exist, which set of values drives each copy, and what rules govern creation, deletion, and replacement. At Google, Stripe, and Cloudflare, correct use of count, for_each, and lifecycle is a prerequisite for writing any production module.

count: Simple Numeric Replication

count takes a non-negative integer and tells Terraform to create that many identical (or near-identical) copies of the resource. Each instance is addressed as resource_type.resource_name[index], with count.index available inside the block to differentiate instances.

# Create three identical EC2 instances in the web tier
resource "aws_instance" "web" {
  count         = var.web_instance_count   # e.g. 3
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.medium"
  subnet_id     = var.subnet_ids[count.index % length(var.subnet_ids)]

  tags = {
    Name = "web-${count.index + 1}"
    Role = "web"
  }
}

# Reference a specific instance or all instances:
# aws_instance.web[0].id          -- first instance
# aws_instance.web[*].id          -- splat: list of all IDs
# aws_instance.web[count.index]   -- inside the block itself

output "web_instance_ids" {
  value = aws_instance.web[*].id
}

Production pitfall — count and index stability: When you use count, Terraform identifies each instance by its numeric index. If you remove an element from the middle of a list (e.g., you had 5 instances and remove index 2), Terraform re-indexes everything above the removed element. This causes it to destroy and recreate instances 2, 3, and 4 — even though you only wanted to remove one. For any resource whose identity matters (EC2, RDS, IAM user), prefer for_each with a set or map so each instance has a stable string key.

for_each: Key-Driven Replication

for_each accepts either a set(string) or a map(any) and creates one resource instance per element. Each instance is addressed as resource_type.resource_name["key"]. Because instances are keyed by string rather than integer, adding or removing one element only affects that specific instance — all other instances remain untouched. This is the production-safe default for all non-trivial multi-resource patterns.

# Pattern 1: for_each with a set of strings (when all instances share config)
variable "availability_zones" {
  type    = set(string)
  default = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

resource "aws_subnet" "private" {
  for_each          = var.availability_zones
  vpc_id            = aws_vpc.main.id
  availability_zone = each.key
  cidr_block        = cidrsubnet(var.vpc_cidr, 4, index(tolist(var.availability_zones), each.key))

  tags = { Name = "private-${each.key}" }
}

# Pattern 2: for_each with a map (when instances have different configs)
variable "iam_users" {
  type = map(object({
    path   = string
    groups = list(string)
  }))
  default = {
    "svc-deployer" = { path = "/service/", groups = ["deployers"] }
    "svc-reader"   = { path = "/service/", groups = ["readers"]   }
    "ops-admin"    = { path = "/ops/",     groups = ["admins", "deployers"] }
  }
}

resource "aws_iam_user" "this" {
  for_each = var.iam_users
  name     = each.key
  path     = each.value.path
  tags     = { ManagedBy = "terraform" }
}

resource "aws_iam_user_group_membership" "this" {
  for_each = var.iam_users
  user     = aws_iam_user.this[each.key].name
  groups   = each.value.groups
}

# Address a specific instance:
# aws_iam_user.this["svc-deployer"].arn
# aws_iam_user.this["ops-admin"].unique_id

Converting a list to a for_each set: If a variable arrives as a list(string), convert it before passing to for_each: for_each = toset(var.my_list). For maps derived from complex objects, use a for expression in a local: local.user_map = { for u in var.users : u.name => u } then for_each = local.user_map. Never pass a list directly — Terraform will error.

Removing a middle element with count causes cascading replacements; for_each targets only the removed key.

lifecycle: Controlling Creation, Deletion, and Replacement

The lifecycle block sits inside any resource and overrides Terraform's default behaviour for that resource's life cycle events. It has four arguments: create_before_destroy, prevent_destroy, ignore_changes, and replace_triggered_by. Getting these right is the difference between a zero-downtime deployment and a 3 AM incident.

create_before_destroy

By default, when Terraform must replace a resource (a change forces a new resource — e.g., changing an AMI or a launch template image), it destroys the old resource first, then creates the new one. This means a brief gap in capacity. For load-balanced fleets, TLS certificates, and IAM roles with policy attachments, this gap is unacceptable. create_before_destroy = true reverses the order: Terraform creates the replacement, then destroys the original once the replacement is confirmed.

# Zero-downtime AMI rotation for an Auto Scaling launch template
resource "aws_launch_template" "web" {
  name_prefix   = "web-"
  image_id      = var.ami_id
  instance_type = "m6i.large"

  lifecycle {
    create_before_destroy = true
    # When image_id changes, Terraform creates the new launch template version
    # BEFORE destroying the old one, so the ASG always has a valid template.
  }
}

# ACM certificate replacement — cert must exist before old one is removed
resource "aws_acm_certificate" "main" {
  domain_name       = var.domain_name
  validation_method = "DNS"

  lifecycle {
    create_before_destroy = true
  }
}

# IAM role — create the new role and attach policies before deleting the old
resource "aws_iam_role" "worker" {
  name_prefix        = "worker-"
  assume_role_policy = data.aws_iam_policy_document.assume.json

  lifecycle {
    create_before_destroy = true
  }
}

Why name_prefix, not name: When create_before_destroy = true, the new resource must exist simultaneously with the old one. AWS requires unique names for most resources. Use name_prefix instead of name so Terraform can generate a unique suffix for the new resource. With a fixed name, the create step fails because the name is still taken by the old resource.

prevent_destroy

prevent_destroy = true causes Terraform to error — and halt the plan — if the plan would destroy that resource. This is a last-resort guard for resources that must never be accidentally deleted: production RDS clusters, S3 buckets with compliance data, KMS keys, and Elasticsearch domains. It is a code-level safeguard, not a permissions safeguard — a determined operator can remove the block and re-run. Layer it with AWS resource policies and SCPs for defence in depth.

resource "aws_db_cluster" "main" {
  cluster_identifier = "prod-aurora-cluster"
  engine             = "aurora-postgresql"
  engine_version     = "15.3"
  master_username    = var.db_user
  master_password    = var.db_password
  deletion_protection = true   # AWS-level guard (separate from lifecycle)

  lifecycle {
    prevent_destroy = true     # Terraform-level guard: plan errors if destroy planned
  }
}

resource "aws_s3_bucket" "audit_logs" {
  bucket = "company-audit-logs-prod"

  lifecycle {
    prevent_destroy = true
  }
}

ignore_changes

ignore_changes tells Terraform to stop tracking drift on specific attributes. The canonical use case is when an external system (an autoscaler, a human operator, a configuration management tool) legitimately changes an attribute after Terraform creates the resource. Without ignore_changes, Terraform would detect the drift on every plan and revert it — breaking the external system's changes. Common attributes: desired_capacity on an ASG, ami when images are rotated by an external process, and tags when a tag policy injects cost-allocation tags outside Terraform.

resource "aws_autoscaling_group" "web" {
  name                = "web-asg"
  min_size            = 2
  max_size            = 20
  desired_capacity    = 4      # initial value; managed by scaling policies after creation
  launch_template {
    id      = aws_launch_template.web.id
    version = "$Latest"
  }

  lifecycle {
    # Autoscaling policies and scheduled actions change desired_capacity.
    # Ignore it so Terraform does not revert the scaler's decisions.
    ignore_changes = [desired_capacity]

    # Also ignore externally-injected cost-allocation tags
    # ignore_changes = [tags["CostCenter"], tags["BusinessUnit"]]
  }
}

# replace_triggered_by: force replacement when a related resource changes
# (e.g., rotate EC2 instances when the launch template changes)
resource "aws_autoscaling_group" "web_v2" {
  name             = "web-v2-asg"
  min_size         = 2
  max_size         = 20
  desired_capacity = 4
  launch_template {
    id      = aws_launch_template.web.id
    version = "$Latest"
  }

  lifecycle {
    replace_triggered_by = [aws_launch_template.web]
    # When aws_launch_template.web is replaced, the ASG is also replaced.
    # Combine with create_before_destroy for zero-downtime rotations.
    create_before_destroy = true
  }
}

Pro practice — combining lifecycle arguments: For a production fleet rotation, combine all three: create_before_destroy = true (no downtime), replace_triggered_by = [aws_launch_template.web] (automatic cascade), and ignore_changes = [desired_capacity] (respect the autoscaler). This trio is the standard pattern in platform engineering teams at companies like Airbnb, Lyft, and GitHub. Trying to manage fleet rotations without these arguments leads to manual taint cycles and scheduled downtime windows.

Conditional Resources with count

A common pattern is count = var.condition ? 1 : 0 to conditionally create a resource. This is the only Terraform-idiomatic way to express "maybe create this resource". When the count is 0, Terraform manages no instances and the resource is effectively absent. When referencing such a conditional resource from another resource, use one(resource_type.name[*].attribute) to safely extract the value (returns null when count is 0, rather than erroring).

# Conditionally create a bastion host only in non-production environments
resource "aws_instance" "bastion" {
  count         = var.environment != "production" ? 1 : 0
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.nano"
  subnet_id     = var.public_subnet_id

  tags = { Name = "bastion-${var.environment}", Role = "bastion" }
}

# Conditionally create a CloudWatch alarm only when SNS ARN is provided
resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  count               = var.alarm_sns_arn != "" ? 1 : 0
  alarm_name          = "cpu-high-${var.environment}"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = 60
  statistic           = "Average"
  threshold           = 80
  alarm_actions       = [var.alarm_sns_arn]
}

# Safe reference using one():
output "bastion_ip" {
  value = one(aws_instance.bastion[*].public_ip)
  # Returns null if bastion does not exist — no error
}

for_each with unknown values: Terraform requires that the keys used in a for_each map or set are known at plan time. If the key comes from a resource attribute not yet created (e.g., a dynamically assigned ID), Terraform errors with "The set of keys cannot be determined until apply." The solution is to use known values as keys — names, slugs, and static identifiers — not computed IDs. If you must use a computed value, fall back to count with a length(), accepting the index-stability trade-off.