Services & Discovery
Services & Discovery
Pods are ephemeral. A rolling update tears down old Pods and creates new ones with entirely different IP addresses. A crash-loop replaces a Pod in seconds. If client code hard-codes a Pod IP, it breaks the moment Kubernetes reschedules. Services solve this by providing a stable virtual IP (the ClusterIP) and a DNS name that always routes to healthy Pod endpoints — regardless of how many Pods exist or where they are running. This lesson walks through every Service type, how kube-proxy programs the dataplane, and how kube-dns makes names resolve inside the cluster.
The Endpoint Object — the Missing Link
Before exploring Service types, understand the plumbing underneath. Every time you create a Service, the Endpoints controller (part of kube-controller-manager) watches for Pods whose labels match the Service's selector and writes their IPs into an Endpoints object with the same name as the Service. When a Pod becomes unready or dies, its IP is removed from the Endpoints list automatically.
ClusterIP — Internal-Only Virtual IP
ClusterIP is the default Service type. Kubernetes allocates a virtual IP from the --service-cluster-ip-range (typically 10.96.0.0/12). That VIP is not routable outside the cluster; it exists only in iptables rules (or IPVS tables) programmed by kube-proxy on every node. Any Pod in the cluster can reach the Service by its ClusterIP or by its DNS name.
kube-proxy watches Services and Endpoints and writes iptables DNAT rules: traffic to 10.96.x.x:80 is randomly distributed (DNAT) to one of the live Pod IPs listed in Endpoints. In IPVS mode (preferred at scale) the same load-balancing happens inside the kernel's IPVS module with richer scheduling algorithms (round-robin, least-connection, etc.).
mode: ipvs in the KubeProxyConfiguration) when you have more than ~1,000 Services. iptables rules scale as O(n) — each new rule must be appended to a growing chain. IPVS uses a hash table and scales to tens of thousands of Services with consistent latency.
kube-dns — Name Resolution Inside the Cluster
CoreDNS (the successor to kube-dns) runs as a Deployment in the kube-system namespace and is exposed via its own ClusterIP Service at the address in /etc/resolv.conf on every Pod (typically 10.96.0.10). The kubernetes plugin in CoreDNS synthesises DNS records from the Kubernetes API:
- A Service named
nginx-svcin namespaceproductionresolves to its ClusterIP at:nginx-svc.production.svc.cluster.local - From within the same namespace, short names work:
nginx-svcornginx-svc.production. - Individual Pod IPs get records like
10-244-1-23.production.pod.cluster.local— rarely used directly.
NodePort — Exposing on Every Node
NodePort extends ClusterIP by also opening a static port (default range 30000–32767) on every node in the cluster. External traffic arriving at <any-node-ip>:<nodePort> is forwarded by kube-proxy into the Service, then load-balanced to a backend Pod.
type: LoadBalancer or an Ingress controller.
LoadBalancer — Cloud-Native External Access
LoadBalancer is a superset of NodePort. In addition to opening the NodePort on every node, it signals the cloud provider's cloud-controller-manager to provision an external load balancer (an AWS NLB, GCP Network LB, Azure LB) and point it at the node ports. The provisioned load balancer IP or hostname is written back into service.status.loadBalancer.ingress.
type: LoadBalancer Service creates one cloud load balancer — this is expensive at scale (each NLB has a cost). At Google or Amazon scale, teams expose dozens of microservices through a single Ingress controller backed by one load balancer, and route by hostname/path rules instead of creating per-Service load balancers.
Headless Services — DNS Round-Robin Without a VIP
Set clusterIP: None to create a headless Service. No VIP is allocated. Instead, CoreDNS returns the individual Pod IP addresses directly in the DNS A record response. Clients receive multiple A records and must do their own selection. This is the pattern used by StatefulSets (databases, Kafka, Zookeeper) where each Pod has a stable identity and clients need to reach a specific replica:
ExternalName — DNS Alias for External Services
ExternalName creates a CNAME alias inside the cluster for an external DNS name. It has no selector and no endpoints — CoreDNS simply returns the CNAME. This lets you reference an RDS instance, a legacy API, or a managed SaaS endpoint using the same Kubernetes-style DNS name as any internal Service, making it easy to swap between an in-cluster and an external backend without changing application config:
Production Failure Modes to Know
- Stale endpoints after a fast crash. kube-proxy updates its iptables rules after the Endpoints controller removes a dead Pod — there is a short window (usually <1 s) where traffic is sent to a terminated Pod. Mitigate with a
preStophook that sleeps for 2–5 seconds and a tightreadinessProbefailure threshold. - DNS caching TTL. JVM and some Go clients cache DNS responses for far longer than the 5-second TTL CoreDNS returns. After a Service endpoint change, old clients may route to stale IPs for minutes. Set JVM flag
-Dsun.net.inetaddr.ttl=5and verify client DNS TTL settings. - kube-proxy iptables sync lag. A very large cluster (10k+ Services) with iptables mode can spend seconds syncing rules. IPVS mode eliminates this. Monitor
sync_proxy_rules_duration_secondsin kube-proxy metrics. - ClusterIP range exhaustion. The default
/12gives ~1M addresses but some clusters over-allocate. Check withkubectl cluster-info dump | grep service-cluster-ip-range.
named targetPort (e.g., targetPort: http pointing to a named port in the Pod spec) over a numeric port. When developers change a container port, updating the Pod spec port name propagates automatically — you do not need to update every Service manifest that references it.