Advanced Terraform & IaC Patterns

Advanced Module Design

18 min Lesson 3 of 28

Advanced Module Design

Terraform modules are the primary unit of reuse and abstraction in any large IaC codebase. A beginner writes a module that wraps a few resources and calls it a day. A senior engineer at a company like Stripe or Airbnb designs modules like APIs — with deliberate interfaces, stable contracts, optional feature flags, and well-understood composition boundaries. This lesson teaches that second level.

The Module as a Published API

Every module you share across teams should be treated like a versioned library. Its input variables are the function signature; its outputs are the return values; its README is the documentation. Callers depend on the interface, not on the internals. This means you can refactor the inside of a module — swap an aws_lb for an aws_alb, change a naming convention, add encryption — without breaking every consumer, as long as the external variable and output contracts are preserved.

Discipline around this interface matters enormously at scale. At Google, internal Terraform modules published to the internal registry are subject to the same semantic versioning rules as any other library: breaking changes require a major version bump, callers pin to a version range, and upgrade is a deliberate migration, not an accidental side-effect of a teammate's commit.

Composition: Modules That Call Modules

The most powerful architectural pattern in advanced Terraform is composition — assembling small, single-purpose modules into larger ones that represent a deployable slice of infrastructure. Think of it like LEGO: a vpc module, a rds module, and an ecs-service module all stay thin and focused. A higher-level app-stack module composes them into a coherent deployment unit. A root module (your environment directory) then composes one or more stacks.

Composition layers: primitive resources are wrapped in focused modules, which are composed into an app-stack, consumed by the root module.

The key rule: each layer knows about the layer directly below it, not deeper. The root module instantiates app-stack. app-stack instantiates vpc, rds, and ecs-service. Those leaf modules manage raw AWS resources. No layer reaches across to a sibling or skips a level. This keeps blast radius small and refactoring safe.

Designing the Interface: Variables

A module's variable interface is where most design mistakes happen. Follow these rules to build interfaces that age well.

Use objects for grouped config, not a flat variable per field. Instead of fifteen separate var.enable_deletion_protection, var.backup_retention_days, etc., group logically related settings into a typed object. This keeps the calling module clean and lets you add new optional fields without changing every existing call site.

Declare explicit types. A variable typed as string versus object({ ... }) is self-documenting and catches mistakes at plan time rather than silently passing garbage. Use optional() inside object types (available since Terraform 1.3) to mark fields that have defaults.

# modules/rds/variables.tf

variable "db_config" {
  description = "Database configuration object"
  type = object({
    instance_class        = string
    engine_version        = string
    allocated_storage_gb  = number
    backup_retention_days = optional(number, 7)
    deletion_protection   = optional(bool, true)
    multi_az              = optional(bool, false)
  })
}

variable "name" {
  description = "Logical name — used as a prefix for all resource names"
  type        = string

  validation {
    condition     = can(regex("^[a-z][a-z0-9-]{2,30}$", var.name))
    error_message = "Name must be lowercase, 3-31 chars, starting with a letter."
  }
}

variable "subnet_ids" {
  description = "List of subnet IDs for the DB subnet group"
  type        = list(string)
}

variable "tags" {
  description = "Map of tags merged onto all resources"
  type        = map(string)
  default     = {}
}

Optional Features via Feature Flags

Real modules need to serve multiple use cases without becoming a different module for every use case. The professional pattern is to gate optional sub-resources behind boolean or object variables. When the flag is its zero-value (false or null), the resource count is zero — it does not exist. When enabled, it is created. Terraform's count and for_each meta-arguments make this possible.

# modules/rds/main.tf

# Core resource — always created
resource "aws_db_instance" "this" {
  identifier             = var.name
  engine                 = "mysql"
  engine_version         = var.db_config.engine_version
  instance_class         = var.db_config.instance_class
  allocated_storage      = var.db_config.allocated_storage_gb
  db_subnet_group_name   = aws_db_subnet_group.this.name
  backup_retention_period = var.db_config.backup_retention_days
  deletion_protection    = var.db_config.deletion_protection
  multi_az               = var.db_config.multi_az
  tags                   = var.tags
}

# Optional read replica — only created when replica_count > 0
variable "replica_count" {
  type    = number
  default = 0
}

resource "aws_db_instance" "replica" {
  count               = var.replica_count
  identifier          = "${var.name}-replica-${count.index}"
  replicate_source_db = aws_db_instance.this.id
  instance_class      = var.db_config.instance_class
  tags                = var.tags
}

# Optional CloudWatch alarm — enabled via object flag (null = disabled)
variable "alarm_config" {
  description = "Set to enable a CloudWatch CPU alarm. Null disables it."
  type = object({
    threshold_percent = number
    sns_topic_arn     = string
  })
  default = null
}

resource "aws_cloudwatch_metric_alarm" "cpu" {
  count               = var.alarm_config != null ? 1 : 0
  alarm_name          = "${var.name}-high-cpu"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  metric_name         = "CPUUtilization"
  namespace           = "AWS/RDS"
  period              = 60
  statistic           = "Average"
  threshold           = var.alarm_config[0].threshold_percent
  alarm_actions       = [var.alarm_config[0].sns_topic_arn]
  dimensions = {
    DBInstanceIdentifier = aws_db_instance.this.identifier
  }
}

Prefer null over false for optional objects. A boolean enable_alarm = false still forces the caller to provide all the alarm configuration fields (topic ARN, threshold). An alarm_config = null default means the caller provides zero fields unless they opt in. This makes calling the module significantly cleaner for the common case.

Outputs: The Contract with Callers

Outputs are not an afterthought. They are the interface through which parent modules and root modules consume your module's results. Export everything a caller could reasonably need: resource IDs, ARNs, DNS names, security group IDs. Do not expose internal implementation details (like intermediate locals or computed names that could change). Mark sensitive outputs with sensitive = true so values are redacted from plan output and logs.

Output stability is a breaking-change concern. If you remove or rename an output that callers reference — even without changing any resource — their plan will fail with a reference error. Treat output names with the same rigor as a public API surface. Deprecate and alias before removing.

Anti-Patterns to Eliminate

These patterns appear constantly in Terraform codebases at companies that grew fast without governance. Learn to recognize and fix them.

The "God module": one module that provisions everything — networking, compute, database, DNS, IAM — for an entire environment. It has 200+ input variables, takes 45 minutes to plan, and is impossible to change safely. Fix: decompose into single-responsibility leaf modules composed by a thin root.
Hardcoded values inside modules: a module that assumes us-east-1, a specific AMI ID, or a specific account ID. It works for the author and breaks for every other team. Fix: make the value a variable; the caller provides context.
Passing full provider configs into modules: a module that takes var.aws_access_key and configures its own provider. This breaks provider aliasing, makes assume-role patterns impossible, and breaks the standard Terraform provider inheritance model. Fix: modules must never configure providers — that is the root module's job.
Treating terraform.tfvars as the only interface: large flat .tfvars files with hundreds of loose variables, with no type enforcement. Fix: typed object variables with validation blocks.

Never use count on a module that creates multiple distinct resources if order matters. If you use count = length(var.envs) on a module and later insert a new environment at position 0, Terraform will plan to destroy and recreate everything from index 0 onward. Use for_each with a map or set of strings instead — resources are keyed by the map key, not by position, so adding a new key only creates the new resource.

Putting It Together: A Calling Example

Here is how a root module instantiates the rds module designed above — clean, explicit, and easy to review in a pull request:

# environments/prod/main.tf

module "app_db" {
  source  = "git::https://github.com/acme/tf-modules.git//rds?ref=v2.4.0"

  name        = "app-prod"
  subnet_ids  = module.vpc.private_subnet_ids
  tags        = local.common_tags

  db_config = {
    instance_class        = "db.r7g.xlarge"
    engine_version        = "8.0.36"
    allocated_storage_gb  = 200
    backup_retention_days = 14
    deletion_protection   = true
    multi_az              = true
  }

  replica_count = 2

  alarm_config = {
    threshold_percent = 80
    sns_topic_arn     = module.alerting.pagerduty_sns_arn
  }
}

Pin to a specific Git tag (not a branch) in production module sources. A branch reference means a teammate's unreviewed commit can silently change what your next terraform apply deploys. Tags are immutable; branches are not.