Kubernetes Fundamentals

Services & Discovery

18 min Lesson 6 of 32

Services & Discovery

Pods are ephemeral. A rolling update tears down old Pods and creates new ones with entirely different IP addresses. A crash-loop replaces a Pod in seconds. If client code hard-codes a Pod IP, it breaks the moment Kubernetes reschedules. Services solve this by providing a stable virtual IP (the ClusterIP) and a DNS name that always routes to healthy Pod endpoints — regardless of how many Pods exist or where they are running. This lesson walks through every Service type, how kube-proxy programs the dataplane, and how kube-dns makes names resolve inside the cluster.

The Endpoint Object — the Missing Link

Before exploring Service types, understand the plumbing underneath. Every time you create a Service, the Endpoints controller (part of kube-controller-manager) watches for Pods whose labels match the Service's selector and writes their IPs into an Endpoints object with the same name as the Service. When a Pod becomes unready or dies, its IP is removed from the Endpoints list automatically.

# Inspect the endpoints backing a service called "api"
kubectl get endpoints api -o yaml

# Watch endpoints update in real time during a rolling deployment
kubectl get endpoints api -w

# Sample output — each "address" is a live, ready Pod IP
# addresses:
#   - ip: 10.244.1.23
#   - ip: 10.244.2.47
# notReadyAddresses:
#   - ip: 10.244.1.25   <-- Pod failed readinessProbe, excluded from traffic

EndpointSlices replaced the older Endpoints API at scale. For clusters with more than ~100 endpoints per Service, kube-proxy and other consumers switch to EndpointSlice objects automatically. The concept is identical — just sharded into smaller chunks to reduce watch event fan-out at 10,000+ Pod scale.

ClusterIP — Internal-Only Virtual IP

ClusterIP is the default Service type. Kubernetes allocates a virtual IP from the --service-cluster-ip-range (typically 10.96.0.0/12). That VIP is not routable outside the cluster; it exists only in iptables rules (or IPVS tables) programmed by kube-proxy on every node. Any Pod in the cluster can reach the Service by its ClusterIP or by its DNS name.

# Minimal ClusterIP manifest — expose port 80 of an nginx Deployment
apiVersion: v1
kind: Service
metadata:
  name: nginx-svc
  namespace: production
spec:
  type: ClusterIP          # default; can be omitted
  selector:
    app: nginx
  ports:
    - name: http
      port: 80             # port clients connect to
      targetPort: 8080     # port the container listens on
      protocol: TCP

kube-proxy watches Services and Endpoints and writes iptables DNAT rules: traffic to 10.96.x.x:80 is randomly distributed (DNAT) to one of the live Pod IPs listed in Endpoints. In IPVS mode (preferred at scale) the same load-balancing happens inside the kernel's IPVS module with richer scheduling algorithms (round-robin, least-connection, etc.).

Enable IPVS mode for kube-proxy (mode: ipvs in the KubeProxyConfiguration) when you have more than ~1,000 Services. iptables rules scale as O(n) — each new rule must be appended to a growing chain. IPVS uses a hash table and scales to tens of thousands of Services with consistent latency.

kube-dns — Name Resolution Inside the Cluster

CoreDNS (the successor to kube-dns) runs as a Deployment in the kube-system namespace and is exposed via its own ClusterIP Service at the address in /etc/resolv.conf on every Pod (typically 10.96.0.10). The kubernetes plugin in CoreDNS synthesises DNS records from the Kubernetes API:

A Service named nginx-svc in namespace production resolves to its ClusterIP at: nginx-svc.production.svc.cluster.local
From within the same namespace, short names work: nginx-svc or nginx-svc.production.
Individual Pod IPs get records like 10-244-1-23.production.pod.cluster.local — rarely used directly.

Service routing flow: CoreDNS resolves the Service name to its ClusterIP, kube-proxy DNAT rules forward the connection to a healthy backend Pod.

NodePort — Exposing on Every Node

NodePort extends ClusterIP by also opening a static port (default range 30000–32767) on every node in the cluster. External traffic arriving at <any-node-ip>:<nodePort> is forwarded by kube-proxy into the Service, then load-balanced to a backend Pod.

apiVersion: v1
kind: Service
metadata:
  name: api-nodeport
spec:
  type: NodePort
  selector:
    app: api
  ports:
    - port: 80           # ClusterIP port (intra-cluster)
      targetPort: 3000   # container port
      nodePort: 31080    # static port on every node (omit to auto-assign)

NodePort is not a production ingress pattern. It exposes a random high port, requires clients to know node IPs (which change), and bypasses load balancing above the node layer. Use NodePort only in bare-metal environments where a LoadBalancer controller is unavailable — and even then, put an external L4 load balancer (HAProxy, keepalived) in front of it. In cloud environments always use type: LoadBalancer or an Ingress controller.

LoadBalancer — Cloud-Native External Access

LoadBalancer is a superset of NodePort. In addition to opening the NodePort on every node, it signals the cloud provider's cloud-controller-manager to provision an external load balancer (an AWS NLB, GCP Network LB, Azure LB) and point it at the node ports. The provisioned load balancer IP or hostname is written back into service.status.loadBalancer.ingress.

apiVersion: v1
kind: Service
metadata:
  name: payments-lb
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"   # AWS NLB (layer 4)
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
  type: LoadBalancer
  selector:
    app: payments
  ports:
    - port: 443
      targetPort: 8443
      protocol: TCP

# After applying, check the assigned external IP/hostname:
# kubectl get svc payments-lb
# NAME          TYPE           CLUSTER-IP    EXTERNAL-IP                   PORT(S)
# payments-lb   LoadBalancer   10.96.5.10    a1b2c3.elb.amazonaws.com      443:31443/TCP

One type: LoadBalancer Service creates one cloud load balancer — this is expensive at scale (each NLB has a cost). At Google or Amazon scale, teams expose dozens of microservices through a single Ingress controller backed by one load balancer, and route by hostname/path rules instead of creating per-Service load balancers.

Headless Services — DNS Round-Robin Without a VIP

Set clusterIP: None to create a headless Service. No VIP is allocated. Instead, CoreDNS returns the individual Pod IP addresses directly in the DNS A record response. Clients receive multiple A records and must do their own selection. This is the pattern used by StatefulSets (databases, Kafka, Zookeeper) where each Pod has a stable identity and clients need to reach a specific replica:

apiVersion: v1
kind: Service
metadata:
  name: postgres-headless
spec:
  clusterIP: None          # headless
  selector:
    app: postgres
  ports:
    - port: 5432

# DNS lookup returns all Pod IPs directly:
# nslookup postgres-headless.production.svc.cluster.local
# Name: postgres-headless.production.svc.cluster.local
# Address: 10.244.1.5
# Address: 10.244.2.9
# Address: 10.244.3.2

ExternalName — DNS Alias for External Services

ExternalName creates a CNAME alias inside the cluster for an external DNS name. It has no selector and no endpoints — CoreDNS simply returns the CNAME. This lets you reference an RDS instance, a legacy API, or a managed SaaS endpoint using the same Kubernetes-style DNS name as any internal Service, making it easy to swap between an in-cluster and an external backend without changing application config:

apiVersion: v1
kind: Service
metadata:
  name: rds-postgres
  namespace: production
spec:
  type: ExternalName
  externalName: mydb.cluster-abc.us-east-1.rds.amazonaws.com

# Application connects to: rds-postgres.production.svc.cluster.local:5432
# CoreDNS resolves it as CNAME -> mydb.cluster-abc.us-east-1.rds.amazonaws.com

Production Failure Modes to Know

Stale endpoints after a fast crash. kube-proxy updates its iptables rules after the Endpoints controller removes a dead Pod — there is a short window (usually <1 s) where traffic is sent to a terminated Pod. Mitigate with a preStop hook that sleeps for 2–5 seconds and a tight readinessProbe failure threshold.
DNS caching TTL. JVM and some Go clients cache DNS responses for far longer than the 5-second TTL CoreDNS returns. After a Service endpoint change, old clients may route to stale IPs for minutes. Set JVM flag -Dsun.net.inetaddr.ttl=5 and verify client DNS TTL settings.
kube-proxy iptables sync lag. A very large cluster (10k+ Services) with iptables mode can spend seconds syncing rules. IPVS mode eliminates this. Monitor sync_proxy_rules_duration_seconds in kube-proxy metrics.
ClusterIP range exhaustion. The default /12 gives ~1M addresses but some clusters over-allocate. Check with kubectl cluster-info dump | grep service-cluster-ip-range.

Always prefer a named targetPort (e.g., targetPort: http pointing to a named port in the Pod spec) over a numeric port. When developers change a container port, updating the Pod spec port name propagates automatically — you do not need to update every Service manifest that references it.