Kubernetes Networking & Storage

Cluster DNS & Service Discovery

18 min Lesson 6 of 31

Cluster DNS & Service Discovery

Every production Kubernetes cluster routes internal traffic through a single critical subsystem: CoreDNS. When your pod calls http://payment-svc:8080, it is not hitting any hardcoded IP — it is resolving a DNS name that CoreDNS translates into the ClusterIP of a Service on the fly. Understanding this pipeline end-to-end is the difference between guessing at network failures and fixing them in under five minutes.

CoreDNS: The Cluster's Authoritative Resolver

CoreDNS replaced kube-dns as the default DNS add-on starting in Kubernetes 1.13. It runs as a Deployment (typically two replicas for HA) in the kube-system namespace, exposed by the kube-dns Service at a stable ClusterIP — usually 10.96.0.10 (the tenth address of your service CIDR). Every node's kubelet writes that IP into /etc/resolv.conf inside every pod.

CoreDNS is configured via a ConfigMap named coredns in kube-system. The config language is called a Corefile. The default Corefile looks like this:

kubectl -n kube-system get configmap coredns -o yaml

# Typical Corefile section:
# .:53 {
#     errors
#     health {
#        lameduck 5s
#     }
#     ready
#     kubernetes cluster.local in-addr.arpa ip6.arpa {
#        pods insecure
#        fallthrough in-addr.arpa ip6.arpa
#        ttl 30
#     }
#     prometheus :9153
#     forward . /etc/resolv.conf {
#        max_concurrent 1000
#     }
#     cache 30
#     loop
#     reload
#     loadbalance
# }

Key plugins to understand: kubernetes — answers queries for the cluster domain; forward — sends everything else upstream (the node's own resolver); cache — TTL-based caching so every pod lookup doesn't hit the API Server; loop — detects forwarding loops and crashes safely; health/ready — expose liveness and readiness endpoints.

Fully Qualified Domain Names (FQDNs)

Kubernetes DNS follows a strict hierarchical naming convention. Every Service gets a DNS entry under the cluster's configured domain (default: cluster.local).

The Kubernetes DNS FQDN hierarchy: cluster domain → svc → namespace → service name.

Search domains defined in every pod's /etc/resolv.conf let you use short names. A pod in the payments namespace can reach a Service with just api, and the resolver tries api.payments.svc.cluster.local. first. Across namespaces, use api.payments (resolves to api.payments.svc.cluster.local.). The fully qualified form with the trailing dot always bypasses search-domain expansion.

FQDN forms, from shortest to longest:

nginx — within same namespace only
nginx.default — cross-namespace shorthand
nginx.default.svc — explicit svc segment
nginx.default.svc.cluster.local — full FQDN (no trailing dot needed in practice)
nginx.default.svc.cluster.local. — absolute (trailing dot suppresses search)

Headless Services & Pod DNS Records

When you set clusterIP: None, the Service becomes headless. CoreDNS returns the individual pod IPs directly (A records for each pod) rather than a single VIP. StatefulSets exploit this: each pod gets a stable DNS name in the form <pod-name>.<service-name>.<namespace>.svc.cluster.local. For example, postgres-0.postgres-headless.data.svc.cluster.local always resolves to the IP of the primary pod — even after restarts, as long as the pod name stays the same.

Debugging DNS Resolution

DNS failures in Kubernetes manifest in subtle ways: connection refused (wrong IP), no such host (NXDOMAIN), or timeouts (CoreDNS pods unhealthy). The standard debugging workflow uses a throwaway pod running dnsutils.

# 1. Spin up a debug pod
kubectl run dnsdbg --image=registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3 \
  --restart=Never --rm -it -- bash

# 2. Inspect /etc/resolv.conf inside the pod
cat /etc/resolv.conf
# Expected output:
# nameserver 10.96.0.10
# search default.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5

# 3. Resolve a Service name
nslookup nginx
nslookup nginx.default.svc.cluster.local

# 4. Check CoreDNS logs for errors
kubectl -n kube-system logs -l k8s-app=kube-dns --tail=50

# 5. Verify CoreDNS pods are running
kubectl -n kube-system get pods -l k8s-app=kube-dns

# 6. Check CoreDNS metrics (if Prometheus is available)
kubectl -n kube-system port-forward svc/kube-dns 9153:9153
# Then: curl http://localhost:9153/metrics | grep coredns_dns_request

The ndots:5 option is a production performance trap. With ndots:5, any name with fewer than 5 dots triggers up to 6 DNS lookups before reaching the absolute form. For external names like api.stripe.com, the resolver tries api.stripe.com.default.svc.cluster.local., then api.stripe.com.svc.cluster.local., etc., before finally trying api.stripe.com.. At scale this doubles or triples your DNS query rate. Fix it by appending a trailing dot (api.stripe.com.) for external names in your app config, or by setting dnsConfig.options[ndots: 1] on pods that only make external calls.

Customizing DNS per Pod

You can override DNS behavior per pod using the dnsConfig and dnsPolicy fields. dnsPolicy: ClusterFirst is the default (use CoreDNS, fall back to node resolver). dnsPolicy: None lets you supply a fully custom resolver.

apiVersion: v1
kind: Pod
metadata:
  name: custom-dns-pod
spec:
  dnsPolicy: "None"
  dnsConfig:
    nameservers:
      - 8.8.8.8          # Use Google DNS (or your internal resolver)
    searches:
      - payments.svc.cluster.local
      - svc.cluster.local
    options:
      - name: ndots
        value: "2"        # Reduces unnecessary search-domain lookups
      - name: timeout
        value: "2"
  containers:
    - name: app
      image: myapp:latest

CoreDNS ConfigMap Tuning for Production

At large scale (500+ nodes, tens of thousands of pods), CoreDNS can become a bottleneck. Production best practices:

Run at least 2 replicas, ideally 3-4, spread across nodes with pod anti-affinity.
Cache TTL — the default cache 30 is conservative. Raise to cache 120 for stable Services to cut upstream traffic by 4×.
Horizontal Pod Autoscaler — wire CoreDNS to HPA scaling on DNS QPS via Prometheus adapter.
NodeLocal DNSCache — a DaemonSet that runs a local cache on every node, intercepting DNS calls before they hit the CoreDNS Service. Reduces latency from ~2ms to ~0.1ms for cached entries and eliminates conntrack table pressure from UDP DNS.

CoreDNS ConfigMap edits take effect after a reload cycle, not immediately. The reload plugin polls for changes every 30 seconds by default. If you change the Corefile and see unexpected behavior, wait 30 seconds or delete the CoreDNS pods to force a restart. Also: editing the ConfigMap while a malformed Corefile is in place will crash CoreDNS on reload — always validate syntax with corefile-tool validate before applying.

Service Discovery Beyond DNS

DNS is the primary discovery mechanism, but Kubernetes also exposes Services via environment variables injected at pod creation time (e.g., NGINX_SERVICE_HOST, NGINX_SERVICE_PORT). This is a legacy mechanism: it only reflects Services that existed before the pod started, and the variable list grows unboundedly in large clusters. Always prefer DNS for service discovery in new code.

At the application layer, production systems layer DNS with a service mesh (Istio, Linkerd) that intercepts DNS-resolved connections and applies mTLS, retries, and circuit-breaking transparently. CoreDNS itself remains authoritative; the mesh just wraps the connection after resolution.