Cluster DNS & Service Discovery
Cluster DNS & Service Discovery
Every production Kubernetes cluster routes internal traffic through a single critical subsystem: CoreDNS. When your pod calls http://payment-svc:8080, it is not hitting any hardcoded IP — it is resolving a DNS name that CoreDNS translates into the ClusterIP of a Service on the fly. Understanding this pipeline end-to-end is the difference between guessing at network failures and fixing them in under five minutes.
CoreDNS: The Cluster's Authoritative Resolver
CoreDNS replaced kube-dns as the default DNS add-on starting in Kubernetes 1.13. It runs as a Deployment (typically two replicas for HA) in the kube-system namespace, exposed by the kube-dns Service at a stable ClusterIP — usually 10.96.0.10 (the tenth address of your service CIDR). Every node's kubelet writes that IP into /etc/resolv.conf inside every pod.
CoreDNS is configured via a ConfigMap named coredns in kube-system. The config language is called a Corefile. The default Corefile looks like this:
Key plugins to understand: kubernetes — answers queries for the cluster domain; forward — sends everything else upstream (the node's own resolver); cache — TTL-based caching so every pod lookup doesn't hit the API Server; loop — detects forwarding loops and crashes safely; health/ready — expose liveness and readiness endpoints.
Fully Qualified Domain Names (FQDNs)
Kubernetes DNS follows a strict hierarchical naming convention. Every Service gets a DNS entry under the cluster's configured domain (default: cluster.local).
Search domains defined in every pod's /etc/resolv.conf let you use short names. A pod in the payments namespace can reach a Service with just api, and the resolver tries api.payments.svc.cluster.local. first. Across namespaces, use api.payments (resolves to api.payments.svc.cluster.local.). The fully qualified form with the trailing dot always bypasses search-domain expansion.
nginx— within same namespace onlynginx.default— cross-namespace shorthandnginx.default.svc— explicit svc segmentnginx.default.svc.cluster.local— full FQDN (no trailing dot needed in practice)nginx.default.svc.cluster.local.— absolute (trailing dot suppresses search)
Headless Services & Pod DNS Records
When you set clusterIP: None, the Service becomes headless. CoreDNS returns the individual pod IPs directly (A records for each pod) rather than a single VIP. StatefulSets exploit this: each pod gets a stable DNS name in the form <pod-name>.<service-name>.<namespace>.svc.cluster.local. For example, postgres-0.postgres-headless.data.svc.cluster.local always resolves to the IP of the primary pod — even after restarts, as long as the pod name stays the same.
Debugging DNS Resolution
DNS failures in Kubernetes manifest in subtle ways: connection refused (wrong IP), no such host (NXDOMAIN), or timeouts (CoreDNS pods unhealthy). The standard debugging workflow uses a throwaway pod running dnsutils.
ndots:5 option is a production performance trap. With ndots:5, any name with fewer than 5 dots triggers up to 6 DNS lookups before reaching the absolute form. For external names like api.stripe.com, the resolver tries api.stripe.com.default.svc.cluster.local., then api.stripe.com.svc.cluster.local., etc., before finally trying api.stripe.com.. At scale this doubles or triples your DNS query rate. Fix it by appending a trailing dot (api.stripe.com.) for external names in your app config, or by setting dnsConfig.options[ndots: 1] on pods that only make external calls.
Customizing DNS per Pod
You can override DNS behavior per pod using the dnsConfig and dnsPolicy fields. dnsPolicy: ClusterFirst is the default (use CoreDNS, fall back to node resolver). dnsPolicy: None lets you supply a fully custom resolver.
CoreDNS ConfigMap Tuning for Production
At large scale (500+ nodes, tens of thousands of pods), CoreDNS can become a bottleneck. Production best practices:
- Run at least 2 replicas, ideally 3-4, spread across nodes with pod anti-affinity.
- Cache TTL — the default
cache 30is conservative. Raise tocache 120for stable Services to cut upstream traffic by 4×. - Horizontal Pod Autoscaler — wire CoreDNS to HPA scaling on DNS QPS via Prometheus adapter.
- NodeLocal DNSCache — a DaemonSet that runs a local cache on every node, intercepting DNS calls before they hit the CoreDNS Service. Reduces latency from ~2ms to ~0.1ms for cached entries and eliminates conntrack table pressure from UDP DNS.
reload plugin polls for changes every 30 seconds by default. If you change the Corefile and see unexpected behavior, wait 30 seconds or delete the CoreDNS pods to force a restart. Also: editing the ConfigMap while a malformed Corefile is in place will crash CoreDNS on reload — always validate syntax with corefile-tool validate before applying.
Service Discovery Beyond DNS
DNS is the primary discovery mechanism, but Kubernetes also exposes Services via environment variables injected at pod creation time (e.g., NGINX_SERVICE_HOST, NGINX_SERVICE_PORT). This is a legacy mechanism: it only reflects Services that existed before the pod started, and the variable list grows unboundedly in large clusters. Always prefer DNS for service discovery in new code.
At the application layer, production systems layer DNS with a service mesh (Istio, Linkerd) that intercepts DNS-resolved connections and applies mTLS, retries, and circuit-breaking transparently. CoreDNS itself remains authoritative; the mesh just wraps the connection after resolution.