Drift, Imports & Brownfield IaC
Drift, Imports & Brownfield IaC
One of the most uncomfortable realities in infrastructure engineering is that the cloud does not wait for your Terraform. Engineers click through the console to fix an outage at 2 AM, a security team patches a security group rule directly via the AWS CLI, an auto-scaling event creates resources Terraform never knew about. Over time, the gap between what Terraform thinks exists and what actually exists — called configuration drift — grows silently until it causes a production incident. This lesson teaches how top-tier DevOps organizations detect drift proactively, how to import existing (brownfield) infrastructure into Terraform control, and how to run a full brownfield adoption campaign without breaking production.
What Is Drift?
Drift is any difference between the desired state recorded in Terraform state and the actual state in the cloud provider. It falls into three categories:
- Attribute drift — a resource exists in both state and the cloud but a property was changed out-of-band: someone bumped an EC2 instance type from
t3.mediumtot3.largein the console. - Missing resource drift — Terraform state says a resource exists but it was deleted outside Terraform (accidental console delete, or the resource was replaced by another process).
- Unmanaged resource drift — a resource exists in the cloud that Terraform has never seen: a manually created S3 bucket, a legacy RDS instance, a security group from the pre-IaC era.
Detecting Drift: The Tools
Terraform provides two native mechanisms for drift detection. The first is terraform plan -refresh-only, introduced in Terraform 0.15.4. It performs a full provider API refresh against every resource in state and generates a plan showing only what changed in the real world — without proposing any configuration changes. The output is a pure diff of reality vs. state.
The second mechanism is terraform plan -detailed-exitcode, which exits with code 2 if there are changes (either config changes or drift), code 0 if no changes, and code 1 on error. This exit code is the hook for automation.
Automating Drift Detection in CI
The production pattern is a scheduled CI job — not a human-triggered one. The job runs every 6 hours, covers every root module, and pages the on-call team when drift is found. In GitHub Actions:
ReadOnlyAccess and S3 backend read. This limits the blast radius if the scheduled job is ever compromised, and prevents a runaway job from making unintended changes.
Importing Existing Resources
When you discover an unmanaged resource — one that exists in the cloud but has no Terraform state — you have two options: delete and recreate via Terraform (disruptive for production), or import it into state (preferred). terraform import reads the real resource from the provider API and writes it into the state file. It does not generate HCL for you — you must write the matching configuration first, then import, then iterate until the plan diff is zero.
Terraform 1.5 introduced the import block, which makes imports declarative, code-reviewable, and reproducible — the modern preferred approach over the CLI command.
terraform plan and check for a zero-diff result. If the plan shows changes, Terraform will apply those changes on the next run — potentially overwriting production configuration. Fix every attribute mismatch before you let automation apply.
The Brownfield Adoption Playbook
Adopting Terraform for an existing production environment (brownfield IaC) is one of the most high-stakes operations a platform team performs. At large companies this is a months-long campaign. The safe pattern is strangler-fig adoption: import resources one layer at a time, run in plan-only mode for weeks before enabling apply, and maintain a rollback path at every step.
- Inventory and triage. Use AWS Config, the AWS CLI with
describe-*commands, or a tool liketerraformerto enumerate every resource in the account. Categorize: managed vs. unmanaged. Prioritize by risk — import networking last (highest blast radius), start with stateless compute. - Write HCL before importing. Write the resource configuration, commit it, get it reviewed. Import is irreversible in the sense that the state file now tracks the resource — a mistake in HCL that is then applied can destroy or replace the real resource.
- Import in a branch, plan in CI. Create a PR with the import block and HCL. The CI plan run will show the diff. Merge only when the plan is zero-diff (or all diffs are acceptable, non-destructive attribute normalization like tag casing).
- Enable apply gradually. Start with plan-only CI runs for 2 weeks on the newly imported resources. Verify drift detection finds no unexpected changes. Only then open the apply gate.
- Document every import. Add a comment in HCL or a git commit message explaining why this resource was brownfield-imported: the original ticket, who created the resource manually, and when it was imported. This institutional memory prevents future teams from thinking the resource is safe to delete.
lifecycle { prevent_destroy = true } on every brownfield-imported resource for the first 90 days. This guard prevents an accidental terraform destroy or a misconfigured for_each from deleting a production database that predates IaC. Remove it only after the team has full confidence in the HCL configuration.
The lifecycle.ignore_changes Escape Hatch
Some attributes are legitimately managed outside Terraform: the desired count of an ECS service (managed by autoscaling), the AMI of an EC2 instance (managed by an AMI-baking pipeline), the password of an RDS instance (rotated by AWS Secrets Manager). For these, use ignore_changes to prevent Terraform from reverting the external change:
ignore_changes should be used sparingly and always documented with a comment explaining why the attribute is managed externally. Overuse leads to silent configuration creep where Terraform stops being the source of truth for important attributes.
Terraform Moved Blocks for Safe Refactoring
When you refactor HCL — renaming a resource, moving it into a module, changing a count to for_each — Terraform by default sees the old resource as destroyed and the new one as created. For production resources this means a delete/recreate cycle. The moved block, introduced in Terraform 1.1, tells Terraform that the same real resource is now referenced by a different address:
After applying, the moved block can be kept permanently as documentation of the refactor history, or removed once the team has confirmed the change is stable. At big-tech scale, keeping them for one release cycle (one sprint) and then removing in a follow-up PR is the standard practice.
Production Failure Modes
The most common brownfield disasters follow predictable patterns. Understanding them protects you:
- Import then apply without plan review. The imported HCL has a wrong attribute (wrong
vpc_id, wrongengine_version). The nextterraform applyreplaces the resource. Always get a zero-diff plan before enabling apply. - Drifted security groups silently reverted. A security team added a hotfix rule to a security group after a DDoS incident. Terraform reverts it on the next apply. The rule was the only thing blocking the attack vector. Now the attack resumes. Run drift detection before every apply.
- Mass resource destruction from
for_eachkey change. A brownfield resource was imported withcount. Someone refactors tofor_eachwith string keys. Terraform sees allcount-indexed resources as destroyed and creates newfor_eachones. Usemovedblocks to map old addresses to new ones.
Drift detection and brownfield import are the two skills that separate teams that use Terraform for compliance theater from teams that use it as the genuine operational source of truth. Master both, automate drift detection from day one, and your infrastructure state will remain trustworthy even in the messiest production environments.