State Surgery
State Surgery
Terraform state is the single source of truth that maps your HCL configuration to real-world infrastructure. In an ideal world you would never touch it directly — every resource would be born from terraform apply and die from terraform destroy. Production is not ideal. Infrastructure teams face brownfield resources created before Terraform existed, refactors that rename resources, accidental state corruption, and the need to reorganize large monolithic state files. Knowing how to perform state surgery — safely manipulating state without destroying real infrastructure — is a non-negotiable skill at any organization running Terraform at scale.
terraform destroy hits one resource. A botched state operation can desync dozens of resources simultaneously, causing Terraform to attempt to recreate everything on the next apply. Always back up state before any surgical operation, and always run terraform plan afterwards to verify Terraform's intent matches yours.
Backing Up State Before Surgery
Whether your backend is S3, GCS, or Terraform Cloud, capture a local snapshot before touching anything:
On S3 backends with versioning enabled, the previous state version is automatically preserved — but the explicit pull gives you a local copy you can inspect and restore from without cloud access.
The terraform import Command
Use import when a resource already exists in the cloud but has no corresponding Terraform state entry. Classic scenarios: a DBA created an RDS instance manually to unblock a release, an ops engineer added a security group rule in the console, or you are adopting an old AWS account that was never managed by IaC.
The workflow is: write the HCL config first, then import the real resource into state. Terraform will not generate HCL for you (the terraform import CLI only writes state, not config — though terraform plan -generate-config-out introduced in Terraform 1.5 can scaffold config as a starting point).
In Terraform 1.5+, you can also declare imports inside HCL with an import block, which is idempotent and pipeline-safe:
terraform import is stateful and transient — it runs, modifies state, and leaves no record in Git. Import blocks are declarative, version-controlled, and can be planned and applied through a normal CI/CD pipeline. Once the import is applied, remove the block from the codebase (the resource is now under management and the import block is a no-op if left in, but it adds noise).
The moved Block — Renaming Without Destroying
When you refactor HCL — rename a resource, move it into a module, or change a for_each key — Terraform sees the old address disappear and the new one appear. Without guidance, it plans to destroy the old resource and create a new one. In production this means downtime. The moved block (introduced in Terraform 1.1) tells Terraform that the old and new addresses refer to the same real-world object, so no destroy/create cycle occurs.
Moved blocks are permanent historical records — keep them in the codebase long enough for all engineers and pipelines to apply them, then remove them after a stabilization period (typically one sprint). If you remove a moved block before everyone has applied it, the next apply for that engineer will attempt a destroy/create.
The moved block also handles for_each key renames, which are a common refactor pain point:
State CLI Commands: mv, rm, list
The terraform state subcommands are surgical tools for cases where HCL blocks are not sufficient — typically cross-workspace or cross-backend moves.
terraform state list — lists all resource addresses currently tracked in state. Essential before any surgery to understand what you are working with:
terraform state mv — moves a resource from one address to another within the same state file, or between two state files. Use this when moved blocks are not an option (e.g., moving resources across workspaces):
terraform state rm — removes a resource from state without destroying the real infrastructure. Use when you want Terraform to stop managing a resource (hand it back to manual management, or migrate to a different tool) while leaving the real cloud resource intact:
state rm, the next terraform plan will show the resource as "to be created" — because Terraform no longer knows the real resource exists. Either add a lifecycle { ignore_changes = all } block, remove the resource from HCL entirely, or reimport it. Always have a clear intent before removing from state.
Refactoring Without Destroying: The Safe Pattern
The safest way to restructure a large Terraform codebase follows this sequence:
- Backup state —
terraform state pull > backup.tfstate - Write the new HCL — rename resources, extract modules, restructure hierarchies
- Add
movedblocks — one per renamed or relocated address - Run
terraform plan— the plan must show zero adds/destroys; only renames appear asmovednotes - Apply in a non-production workspace first — validate the refactor is clean
- Apply to production — with a team member reviewing the plan output before confirming
- Remove
movedblocks — after all environments have successfully applied
Production Failure Modes
The most dangerous state surgery mistakes at scale:
- Applying without planning after import — the imported resource config does not match reality; Terraform will modify or destroy attributes. Always plan immediately post-import and resolve all diffs before proceeding.
- Removing moved blocks too early — engineers who have not yet applied will see destroy/create cycles on their next apply. Keep moved blocks for at least one release cycle.
- State push without locking — manually pushing a modified state file (
terraform state push) bypasses the distributed lock. If a CI pipeline is mid-apply, the push corrupts state. Use-lock=falseonly as a last resort when you know no apply is in progress. - Mixing
state mvandmovedblocks — applying amovedblock after already runningstate mvfor the same resource will cause Terraform to error. Pick one path and stick to it.
terraform state push requires a second engineer sign-off, a linked incident ticket, and triggers an automated alert to the platform team. Even in smaller organizations, treat any manual state modification as an incident-class event: document what you did, why, and what the resulting plan showed.