Kubernetes Networking & Storage

The Kubernetes Network Model

18 min Lesson 1 of 31

The Kubernetes Network Model

Before you can reason about Services, Ingress, or NetworkPolicies, you must deeply understand the foundational contract that Kubernetes makes about networking — the Kubernetes Network Model. Every advanced networking concept in this tutorial builds on this foundation, and every production incident you will debug traces back to whether this model is actually being honoured on your cluster.

The Three Fundamental Guarantees

The Kubernetes specification mandates three networking guarantees that any conformant cluster must satisfy, regardless of cloud provider, CNI plugin, or infrastructure:

  1. Every Pod gets its own unique IP address. No two running Pods in the cluster — even across different nodes — share an IP. A Pod is the smallest addressable unit, not a container.
  2. Pods on any node can communicate directly with any other Pod on any node, without NAT. A packet sent from Pod A (10.244.1.5) to Pod B (10.244.3.8) arrives at Pod B with the source IP still 10.244.1.5. There is no masquerading, no port mapping, no middlebox translation.
  3. Agents on a node (e.g. kubelet, system daemons) can communicate with every Pod on that node.

What the model deliberately does not mandate: how this is implemented. The implementation is delegated to the Container Network Interface (CNI) plugin installed in your cluster. This separation of contract from implementation is precisely why the same Kubernetes YAML works on AWS, GCP, bare metal, and a local laptop — the contract is portable even if the wiring underneath is wildly different.

Why no NAT? In a traditional VM-based data centre, machines communicate through NAT gateways and the source IP is rewritten at the boundary. In Kubernetes, pods need to know the real source IP of their callers — for access control, for logging, for tracing. NAT would make this impossible without application-layer workarounds. The flat, no-NAT model is a deliberate design decision that makes distributed systems easier to reason about.

What "Flat Network" Means in Practice

Every Pod IP comes from a cluster-wide Pod CIDR range. On a typical kubeadm cluster this is 10.244.0.0/16; on GKE it might be 10.4.0.0/14. The control plane allocates a subnet of that range to each node — say, 10.244.1.0/24 for node-1 and 10.244.2.0/24 for node-2. Pods scheduled to that node receive IPs from their node's subnet.

For two pods on different nodes to communicate without NAT, the CNI plugin must ensure that a packet destined for 10.244.2.5 — which lives on node-2 — actually reaches node-2 even though it originates on node-1. The CNI has several strategies to achieve this: overlay networks (VXLAN tunnels), BGP route advertisement, host-route injection, or cloud-provider VPC routing APIs. The result is always the same flat address space; only the plumbing differs.

Kubernetes flat Pod network: every Pod has a unique routable IP across all nodes Cluster Pod CIDR: 10.244.0.0/16 Node 1 Subnet: 10.244.1.0/24 Pod A 10.244.1.4 Pod B 10.244.1.5 veth pairs → cni0 bridge → eth0 (node NIC) Node 2 Subnet: 10.244.2.0/24 Pod C 10.244.2.3 Pod D 10.244.2.7 veth pairs → cni0 bridge → eth0 (node NIC) Node 3 Subnet: 10.244.3.0/24 Pod E 10.244.3.2 Pod F 10.244.3.9 veth pairs → cni0 bridge → eth0 (node NIC) no NAT no NAT All Pod IPs are unique and directly routable across all nodes — implemented by the CNI plugin
The Kubernetes flat Pod network: each node holds a subnet of the cluster Pod CIDR, and Pods communicate across nodes without NAT.

CNI Plugins: The Implementors of the Contract

The Container Network Interface (CNI) is a specification, not a product. When a Pod is scheduled to a node, kubelet calls the CNI plugin binary — passing the container's network namespace path and configuration — and the plugin wires up the network. On Pod deletion it is called again to clean up. The plugin must satisfy the model's guarantees; how it does so is entirely up to the plugin.

The dominant CNI plugins in production clusters, and how they implement the flat network:

  • Flannel — The simplest option. Creates a VXLAN overlay: inter-node traffic is encapsulated in UDP packets tunnelled over the node's physical network. Low operational complexity, moderate performance penalty due to encapsulation overhead. Common in on-prem labs and small clusters.
  • Calico — Uses BGP (Border Gateway Protocol) to advertise Pod subnet routes between nodes. No encapsulation in the default mode — pure IP routing. Much higher performance than overlay networks. Also the most widely deployed CNI for NetworkPolicy enforcement in enterprise environments.
  • Cilium — Uses eBPF programs in the kernel to implement routing, load-balancing, and security enforcement. Replaces kube-proxy entirely. Best observability story (Hubble), best performance at scale, native support for L7 NetworkPolicies. This is what Google and Meta run internally.
  • AWS VPC CNI — On EKS, each Pod gets a real AWS VPC ENI (Elastic Network Interface) secondary IP. No encapsulation: Pod IPs are natively routable within the VPC. Pod-to-pod traffic stays in the VPC fabric at line rate. The tradeoff: IP exhaustion — every node has a hard limit on ENIs and IPs per ENI.
  • Azure CNI / GKE Dataplane V2 — Cloud-specific equivalents of AWS VPC CNI, using their respective VPC fabrics.
Production default: On managed cloud clusters (EKS, GKE, AKS) the cloud CNI is the default and the right choice — it gives you native VPC routing with no overhead. On bare metal or on-prem, Calico with BGP is the engineering-sound default for clusters above 50 nodes. Flannel is a lab tool, not a production one.

Inspecting the Network Model from Inside a Cluster

When you join a new cluster or debug a networking issue, the first step is to verify the model is working correctly. These commands confirm the fundamentals:

# 1. Confirm the Pod CIDR allocated to each node kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}{end}' # Expected output: # node-1 10.244.1.0/24 # node-2 10.244.2.0/24 # node-3 10.244.3.0/24 # 2. See all Pod IPs and which node they are on kubectl get pods -A -o wide # 3. Identify which CNI plugin is installed kubectl -n kube-system get pods | grep -E 'calico|cilium|flannel|weave|aws-node' ls /etc/cni/net.d/ # CNI config files on any node (requires node access) # 4. Verify cross-node Pod connectivity directly # Deploy a debug pod and ping a pod on a different node: kubectl run netcheck --image=nicolaka/netshoot --restart=Never -- sleep 3600 kubectl exec netcheck -- ping -c 3 <pod-ip-on-another-node> # Confirm no NAT: the source IP seen by the target pod must be the sender's pod IP kubectl exec <target-pod> -- tcpdump -n -i eth0 icmp

The Pod Network Namespace

Every Pod has its own Linux network namespace: a completely isolated view of network interfaces, routing tables, and firewall rules. The CNI plugin creates a veth pair — a virtual Ethernet cable — where one end (eth0) lives inside the Pod's namespace and the other end lives on the host, attached to a bridge (typically cni0) or directly programmed into the routing table. Containers within the same Pod share that namespace, which is why they communicate over localhost and can conflict on ports — they truly share one network stack.

Pod network namespace, veth pair, and bridge wiring on a single node Node (Host Network Namespace) Pod A Namespace Container 1 Container 2 eth0 (10.244.1.4) veth pair Pod B Namespace Container 1 Container 2 eth0 (10.244.1.5) veth pair cni0 bridge (host) eth0 (node NIC) → Physical Network ← shared via localhost → * The "pause" (infra) container holds the network namespace. App containers join it.
Containers inside a Pod share one network namespace (one IP, one routing table). The CNI plugin wires a veth pair from eth0 inside the namespace to the host bridge, and on to the physical network.

Inspecting the Wiring on a Node

When a Pod's networking misbehaves, you need to descend into the node to verify the plumbing. The following commands give you full visibility into what the CNI has wired up:

# Open a privileged debug shell on a node (replace node-name) kubectl debug node/node-1 -it --image=nicolaka/netshoot # Inside the debug pod — the node's root filesystem is at /host # View the CNI config the plugin was given: cat /host/etc/cni/net.d/10-flannel.conflist # List all veth pairs: every running Pod should have one ip link show type veth # Inspect the bridge bridge link show # See the full routing table — each pod subnet should have a route ip route # Confirm a pod IP is reachable from the node ping -c 2 10.244.1.4 # Get the network namespace of a specific container (run on the node itself) # First find the container ID: crictl ps | grep <pod-name> # Then inspect its network namespace: crictl inspect <container-id> | grep -i netns # Enter the pod's netns: nsenter --net=/var/run/netns/<ns-id> ip addr
Production failure mode — IP address exhaustion: On AWS EKS with the VPC CNI, every Pod consumes a real VPC IP from the node's ENI. Instance types have hard limits (e.g. m5.large supports 3 ENIs × 10 IPs = 30 Pod IPs max, minus the node IP itself). Scheduling will fail silently with 0/3 nodes available: 3 Insufficient pods when this limit is hit. Monitor kubectl describe nodeAllocatable.pods vs Non-terminated Pods. Mitigation: use larger instance types, enable prefix delegation (ENABLE_PREFIX_DELEGATION=true on VPC CNI), or move to a CNI that supports secondary CIDRs.

Why This Matters for Everything That Follows

Every higher-level Kubernetes networking construct assumes the flat model is in place. A Service's ClusterIP is a virtual IP that kube-proxy uses to DNAT packets to real Pod IPs — it works only because Pod IPs are directly routable. NetworkPolicies are enforced by the CNI plugin at the pod-to-pod level — they exist only because there is a pod-to-pod layer to enforce rules on. Ingress controllers proxy traffic to Pod endpoints — directly, without NAT. DNS-based service discovery returns Pod IPs — again, meaningful only because those IPs are universally reachable.

When you encounter a broken Service, a NetworkPolicy that is not enforcing, or an Ingress that returns 502, the first question is always: can Pod A reach Pod B directly? If the answer is no, the CNI is broken and nothing above it will work. Start at the bottom of the stack.