Advanced Terraform & IaC Patterns

Workspaces & Environment Strategies

18 min Lesson 2 of 28

Workspaces & Environment Strategies

Every serious Terraform codebase needs to manage at least three environments: dev, staging, and production. The naive approach — copy-paste your .tf files into separate folders and maintain them in parallel — collapses under its own weight within weeks. The professional approach demands a deliberate strategy from day one, because the choice you make here shapes every workflow decision that follows: how you run CI/CD, how you gate deployments, how you isolate blast radius, and how much cognitive overhead every engineer carries.

There are three primary patterns for multi-environment Terraform: workspaces, directory-per-environment, and branch-per-environment. Each has legitimate use cases and serious failure modes. Big-tech platform teams use different patterns for different layers of their infrastructure — understanding the trade-offs lets you choose deliberately rather than accidentally.

Terraform Workspaces: What They Are and What They Are Not

A workspace is an isolated state file within a single backend configuration. When you run terraform workspace new staging, Terraform creates a second state file scoped to that workspace. Your config files are shared; only the state differs. The currently active workspace is exposed as terraform.workspace, a built-in string you can use in interpolations.

# Create and switch workspaces terraform workspace new dev terraform workspace new staging terraform workspace new production terraform workspace list terraform workspace select production # Use the workspace name in config resource "aws_s3_bucket" "app_data" { bucket = "myapp-${terraform.workspace}-data" tags = { Environment = terraform.workspace } } # Workspace-aware variable lookup with a locals map locals { env_config = { dev = { instance_type = "t3.micro" min_size = 1 max_size = 2 } staging = { instance_type = "t3.small" min_size = 2 max_size = 4 } production = { instance_type = "m5.xlarge" min_size = 6 max_size = 20 } } env = local.env_config[terraform.workspace] }

Workspaces are excellent for ephemeral environments — feature branches, PR review environments, short-lived load tests — where you want to spin up an identical copy of a stack, test it, and destroy it cleanly. They are also genuinely useful for simple projects where dev/staging/prod have nearly identical configurations and you are not managing separate AWS accounts per environment.

Workspaces are not environment isolation at the infrastructure level. All workspaces share the same backend, the same provider credentials, and — unless you write explicit guards — the same everything. A terraform apply -workspace=production run with the wrong credentials or a bad plan can destroy production just as easily as any other approach. Worse: because the configs look identical, there is no visual cue that you are operating on prod. Spotify, Stripe, and other large teams have moved away from workspaces for production/staging isolation precisely because of this — the blast radius of a mistaken workspace selection is too high.

Directory-per-Environment: Explicit Isolation

The pattern recommended by HashiCorp, Gruntwork, and most platform engineering teams at scale is to give each environment its own directory (or its own root module), each with its own backend configuration and its own state file. Shared logic lives in modules; the environment directories are thin consumers of those modules with environment-specific variable values.

# Canonical directory layout infrastructure/ modules/ vpc/ # reusable module — no provider, no backend eks/ rds/ environments/ dev/ main.tf # calls modules with dev-sized values variables.tf terraform.tfvars backend.tf # points to dev state in S3 staging/ main.tf terraform.tfvars backend.tf # points to staging state in S3 production/ main.tf terraform.tfvars backend.tf # points to prod state in S3 with stricter lock policy # Example: environments/production/backend.tf terraform { backend "s3" { bucket = "myorg-terraform-state-prod" key = "eks/terraform.tfstate" region = "us-east-1" dynamodb_table = "terraform-state-lock-prod" encrypt = true } }

The key discipline here is that the production directory requires separate AWS credentials — typically an IAM role in a dedicated production AWS account that CI/CD assumes only after a manual approval gate. Dev runs freely; production is protected at the credential layer, not just at the policy layer in your .tf files. This is the AWS Organizations multi-account model: dev, staging, and production are separate AWS accounts, so a credential leak in dev never touches prod.

Directory-per-Environment with Multi-Account Isolation Git Repo environments/dev/ environments/staging/ environments/production/ CI/CD Pipeline Plan → Approve → Apply AWS Dev Account Auto-apply on merge IAM Role: ci-dev AWS Staging Acct Auto-apply on merge IAM Role: ci-staging AWS Prod Account Manual approval gate IAM Role: ci-prod S3 State: dev S3 State: staging S3 State: prod
Directory-per-environment pattern: each environment targets a dedicated AWS account with its own state bucket and IAM role — blast radius is contained at the account boundary.

Branch-per-Environment: A Pattern to Avoid

Some teams try to mirror their environment separation in Git by maintaining long-lived branches: a dev branch, a staging branch, and a main branch for production. CI/CD deploys whichever branch is pushed. This pattern surfaces regularly among teams transitioning from application deployment workflows where branch-per-env was common.

In practice, branch-per-environment for Terraform creates severe problems. HCL files diverge between branches as hot-fixes land directly in main and never get back-ported; three-way merge conflicts in .tf files are notoriously hard to resolve correctly; and the "production branch" gives a false sense of isolation while still sharing state backends (unless you also maintain separate backends per branch, at which point you have duplicated the directory-per-env pattern with extra complexity). The Git history becomes the canonical truth about your infrastructure, but a Git branch is not an environment guard.

The industry consensus in 2025: use directory-per-environment (or Terragrunt's DRY abstraction over it) for long-lived environments like dev/staging/prod; use workspaces for ephemeral, short-lived environments like PR previews, load-test clones, and developer sandboxes. Branch-per-environment is an anti-pattern — avoid it.

Making the Right Choice for Your Stack

Use this decision matrix when choosing your strategy:

  • Single AWS account, simple config, mostly identical envs: workspaces are fine. Keep the guard count = terraform.workspace == "production" ? 0 : 1 on risky resources during experiments.
  • Multiple AWS accounts, enterprise compliance, or any chance of accidental cross-env apply: directory-per-environment with separate backends and separate credentials. This is the default for any team above ~5 engineers.
  • Lots of duplication between environment directories: adopt Terragrunt (Tutorial #7) to DRY this up — you keep the isolation of directories while eliminating the copy-paste of backend configs and module calls.
  • Ephemeral PR/feature environments: workspaces, with automated creation and destruction tied to PR open/close events in your CI system. Name them deterministically: pr-${PR_NUMBER}.
# Example: CI workflow creating and destroying a workspace-based ephemeral env # .github/workflows/pr-env.yml (relevant steps) # On PR open: create the workspace and apply - name: Create PR environment run: | terraform workspace select pr-${{ github.event.pull_request.number }} \ || terraform workspace new pr-${{ github.event.pull_request.number }} terraform apply -auto-approve \ -var="environment=pr-${{ github.event.pull_request.number }}" # On PR close: destroy and clean up the workspace - name: Destroy PR environment if: github.event.action == 'closed' run: | terraform workspace select pr-${{ github.event.pull_request.number }} terraform destroy -auto-approve terraform workspace select default terraform workspace delete pr-${{ github.event.pull_request.number }}
Always lock down who can apply to production at the credential layer, not just at the policy layer in your code. Your production environment directory should be deployable only by a dedicated CI/CD service account that assumes a role with a short-lived session token, protected by OIDC federation (GitHub Actions OIDC → AWS IAM role). No engineer should have standing credentials to apply directly to prod — every change goes through the pipeline, every change is auditable in Git and in CloudTrail.

Environment strategy is one of the highest-leverage decisions in IaC. Getting it right early means your Terraform codebase scales gracefully from three environments to thirty. Getting it wrong means re-architecting under pressure when a dev workspace apply takes out production at 2 AM — a story every platform team that skipped this lesson has lived through.