DNS in Depth
DNS in Depth
Every service call, every kubectl apply, every database connection in your cluster starts with a DNS lookup. DNS is the distributed phone book of the internet — and when it breaks, everything breaks in confusing ways that look like network failures, application bugs, or TLS errors. Understanding DNS at the resolver level is non-negotiable for any operator who wants to diagnose incidents quickly.
The Resolution Flow
When your application calls getaddrinfo("api.example.com"), the operating system does not magically know the IP. It runs through a deterministic chain of resolvers:
- Local cache check: The OS stub resolver checks its in-memory cache. If a valid cached answer exists (TTL not expired), it returns immediately — no network round-trip.
- Recursive resolver (your ISP or configured server): On a miss, the stub resolver forwards the query to a recursive resolver — typically
8.8.8.8,1.1.1.1, or your VPC's resolver (e.g.,169.254.169.253on AWS). The recursive resolver has its own cache. On a hit, it answers immediately. - Root servers: On a cache miss, the recursive resolver asks one of the 13 root server clusters (operated by ICANN, Verisign, etc.) which authoritative server handles
.com. - TLD nameservers: The root server returns the
.comTLD servers. The recursive resolver asks them: "Who is authoritative forexample.com?" - Authoritative nameserver: The TLD server returns the authoritative NS records for
example.com. The recursive resolver queries that server, which owns the actual zone data and returns the A (or AAAA, CNAME, etc.) record. - Answer delivered and cached: The recursive resolver caches the answer for the record's TTL, then returns it to the stub resolver, which caches it in the OS. Your application finally gets an IP address.
169.254.169.253, GCP at 169.254.169.254) handles both private DNS (service discovery) and public DNS from a single endpoint.Record Types You Must Know
DNS stores different kinds of data as typed records. Each record type has a specific structure and purpose:
- A record: Maps a hostname to an IPv4 address. The most common record type. A single hostname can have multiple A records for round-robin load distribution — though this is a naive form of balancing with no health-check awareness.
- AAAA record: Same as A but for IPv6 addresses. Modern stacks should dual-stack: both A and AAAA records on every public hostname.
- CNAME record: Canonical Name — an alias that points one hostname to another hostname (not an IP). The resolver follows the chain until it hits an A or AAAA. A CNAME cannot coexist with other records at the same name, and critically, you cannot use a CNAME at the zone apex (naked domain).
www.example.com CNAME example.comis fine;example.com CNAME something.else.comis not (use ALIAS/ANAME or Route 53 Alias instead). - MX record: Mail Exchanger — points to the hostname(s) that accept email for a domain. Always includes a priority (lower = higher priority).
example.com MX 10 mail1.example.com. The value must be a hostname, not an IP — never put an IP directly in an MX record. - TXT record: Arbitrary text. Used for domain ownership verification (Google Search Console, AWS Certificate Manager), SPF (email sender policy), DKIM public keys, and DMARC policy. If your emails land in spam, start here.
- SRV record: Service locator. Encodes protocol, priority, weight, port, and target hostname in one record. Kubernetes uses SRV records for service discovery within the cluster DNS (
_http._tcp.my-service.namespace.svc.cluster.local). Format:_service._proto.name TTL IN SRV priority weight port target. - NS record: Nameserver — delegates a zone to a set of authoritative servers. Changing NS records is how you migrate DNS hosting between providers.
- PTR record: Pointer — reverse DNS. Maps an IP address back to a hostname. Required for email deliverability (receiving servers check that your mail server IP has a matching PTR). Managed by whoever owns the IP block (usually your cloud provider or ISP, not you).
- SOA record: Start of Authority — every zone has exactly one. Carries the primary NS, zone admin email, serial number (used for zone transfer sync), and refresh/retry/expire timers.
TTL: The Most Misunderstood Field
The Time-To-Live on a DNS record (in seconds) tells every resolver along the chain how long it may cache the answer. TTL is a negative control knob: once a resolver has cached your record, you cannot force it to re-query until the TTL expires. This has operational consequences:
- A TTL of
300(5 min) means a failover change propagates in at most 5 minutes — but only if you set that TTL before the incident. - A TTL of
86400(1 day) — common on lazily-configured domains — means a migration or failover takes up to 24 hours to propagate. You cannot emergency-change a cached TTL mid-flight. - Big-tech practice: lower TTLs to
60–300seconds a day before any planned migration or cutover, wait for the old high TTL to expire, then perform the change. Afterward you can raise TTLs again.
60 seconds by default. For records that never change (SPF, DKIM, MX), 3600 to 86400 is fine. Never use a TTL of 0 in production — it defeats caching entirely and floods your authoritative server.DNS Debugging with dig
dig (Domain Information Groper) is the canonical DNS debugging tool. Every DevOps engineer must be fluent with it. nslookup is limited; dig shows you the full protocol-level picture.
The output of a raw dig command has four sections: QUESTION (what you asked), ANSWER (matching records), AUTHORITY (which NS servers are authoritative for this zone), and ADDITIONAL (glue records — the A records for the NS servers themselves, to avoid a bootstrap chicken-and-egg problem). The footer shows query time, which server answered, and when.
dig @ns1.example.com hostname A. If the authoritative NS shows the new record but resolvers do not, you are waiting for TTL to expire — expected. If the authoritative NS still shows the old record, the change was not saved correctly. These are two completely different problems; conflating them wastes hours during an incident.DNS Failure Modes in Production
DNS failures are among the most operationally deceptive because they manifest as timeouts, connection refused errors, or TLS certificate mismatches — not as "DNS failed." Here are the most common failure patterns at scale:
- Stale cache after TTL-unaware rotation: You point a CNAME at a new load balancer but forget the old TTL is 86400. Traffic continues hitting the old backend for up to 24 hours.
- Split-horizon misconfiguration: Your authoritative DNS returns different answers for internal vs. external queries (intentional split-horizon). A misconfigured split-horizon can cause services reachable from the internet to be unreachable internally, or vice versa — a common source of "works on my machine, broken in prod" reports.
- CNAME at zone apex: Setting a CNAME on the bare domain breaks MX, NS, and SOA lookups. Use your DNS provider's ALIAS or ANAME record (or Route 53 Alias) at the apex instead.
- Kubernetes CoreDNS saturation: CoreDNS is the cluster DNS for Kubernetes. Under high query rates (especially
ndots:5causing search-domain chasing with up to 8 queries per lookup), CoreDNS pods can become CPU-bound and start dropping queries. Monitor CoreDNS request latency as a first-class SLI. - Missing PTR records for mail: Your outbound SMTP server has no reverse DNS entry. Receiving servers reject your email silently or score it as spam. Always verify PTR records for mail server IPs.
svc.cluster.local, then cluster.local, then external — controlled by /etc/resolv.conf search domains and ndots. A query for redis inside a pod actually generates queries for redis.default.svc.cluster.local, redis.svc.cluster.local, redis.cluster.local, and redis. before resolving. Use fully-qualified domain names (ending in .) or set dnsConfig.ndots: 1 for latency-sensitive services.DNS as Infrastructure — Production Patterns
At big-tech scale DNS is treated as critical infrastructure, not a one-time configuration task:
- Multiple authoritative nameservers in different ASNs: Route 53, Cloudflare, NS1 all operate anycast networks. Never rely on a single NS. Zone delegation requires at least two NS records, and best practice is four across geographically-separated clusters.
- DNS as health-check-aware routing: Route 53 Health Checks, Cloudflare Load Balancing, and NS1 Pulsar all allow weighted or failover routing based on health probe results — DNS becomes part of your active traffic management plane.
- DNSSEC: Adds cryptographic signatures to records, preventing cache poisoning (Kaminsky attack). Mandatory in regulated industries; increasingly expected on all public zones. Adds operational complexity — key rollover must be planned.
- Low TTL + CDN CNAME: Modern CDNs (Cloudflare, Fastly, Akamai) want you to CNAME your hostname to their edge. Combine with a 60-second TTL and health-check routing for sub-minute failover at global scale.