Data Sources & References
Data Sources & References
Every non-trivial Terraform configuration needs to read the world before it can change it. You need the ID of the latest Amazon Linux AMI before launching an EC2 instance. You need the ARN of a certificate managed by another team before attaching it to your load balancer. You need the CIDR blocks of a VPC that was created six months ago — long before your module existed. This is exactly what data sources solve.
A data source is a read-only query against a provider's API or state. Declare it with a data block, reference its attributes exactly as you would a resource, and Terraform builds the correct dependency edge automatically. Understanding data sources — and how Terraform's resource graph uses them — is what separates engineers who write toy configurations from engineers who manage production infrastructure at scale.
The data Block
The data block has the same structure as resource: a type, a local name, and a body of filter arguments. The provider resolves the query at plan time and exposes every attribute of the matching object as data.<type>.<name>.<attribute>.
depends_on or split your apply into ordered stages when cross-stack ordering is required.
Common Data Source Patterns
The three patterns you will use on every project are: (1) fetching dynamic IDs like AMIs and certificates, (2) importing shared VPC topology managed by a platform team, and (3) reading outputs from another Terraform state file via terraform_remote_state.
Implicit Dependencies and the Resource Graph
Terraform does not execute resources top-to-bottom. Instead, it builds a directed acyclic graph (DAG) — the resource graph — by parsing every reference in your configuration. When resource B references resource_a.foo.id, Terraform draws an edge from A to B, guaranteeing A is created before B. This happens automatically from references; you almost never need to state it explicitly.
The graph has three node types: provider nodes (initialize the AWS, GCP, or Vault API client), resource nodes (real infrastructure objects), and data nodes (read-only queries). Terraform walks the graph in parallel: independent nodes run concurrently; dependent nodes wait. On a large configuration this parallelism — controlled by -parallelism=N (default 10) — is what makes Terraform fast despite managing hundreds of resources.
Inspecting the Graph
You can materialize the graph at any time with terraform graph, which outputs DOT format. Pipe it into Graphviz to produce a PNG and review node ordering before a high-risk apply:
terraform graph during code review on every large PR. A missing dependency edge means two resources that should be sequential will run in parallel, causing intermittent race-condition failures that are very difficult to reproduce. The graph makes invisible ordering assumptions explicit. At Google and Amazon, platform teams require graph review as part of the Terraform module acceptance checklist.
Explicit Dependencies with depends_on
Terraform's graph infers dependencies from references, but not from side effects. If resource B requires that resource A has been applied — even though B does not directly reference any of A's attributes — you must express that with depends_on. The classic example is an IAM role policy that must propagate before a Lambda function can execute.
depends_on defeats Terraform's parallelism. Every unnecessary edge serializes work that could run concurrently and inflates your apply time. More critically, depends_on on a module forces every resource inside that module to wait — even resources with no logical relationship to the dependency. Add explicit edges only when Terraform genuinely cannot infer the ordering from references alone.
Production Failure Mode: Stale Data Sources
Data sources are re-evaluated on every plan and apply. If the upstream object changes between two applies — for example, a security team rotates the ACM certificate or a platform team changes VPC CIDR allocations — your next plan will see the new value. This is usually correct, but it can cause surprises: a new AMI ID returned by most_recent = true will force replacement of every EC2 instance that references it. In production, pin AMI IDs by adding a name_regex that captures a specific patch level, or use filter on the image-id tag set by your golden-image pipeline. Never use most_recent = true on AMIs in production without a tested rollback plan.
In the next lesson you explore meta-arguments — count, for_each, and lifecycle — which let you express iteration and resource lifecycle policies inside a single block, eliminating repetition at scale.