Kubernetes Fundamentals

Why Kubernetes?

18 min Lesson 1 of 32

Why Kubernetes?

In 2013, Docker made it trivially easy to package an application and its runtime into a portable image. By 2015, every serious engineering team was building container images — but a new class of problems appeared at scale: who decides where each container runs, what happens when it crashes, how do you reach it across hundreds of hosts, and how do you roll out a new version without downtime? These questions have a single collective name: the container orchestration problem. Kubernetes is the industry-standard answer.

This lesson establishes the four core problems that Kubernetes solves — scheduling, self-healing, scaling, and service discovery — and explains why each one becomes intractable the moment you move beyond a handful of containers on a single host.

Problem 1: Scheduling — Where Does Each Container Run?

Imagine 40 microservices, each needing between 0.1 and 4 CPU cores and between 128 MB and 8 GB of RAM, spread across 20 servers with different capacities. You need to bin-pack workloads onto nodes to maximise utilisation without overcommitting any single machine. You also need to respect constraints: the payments service must not share a node with the analytics job that pegs the CPU at 100%; the stateful database Pod must land on a node with fast local SSDs; two replicas of the same service must be placed on different availability zones so a data-centre failure does not take down all replicas simultaneously.

Doing this by hand — even with a well-crafted shell script — does not scale. Every time a node is added, removed, or fails, the calculation must be re-run. Kubernetes solves this with a declarative scheduler: you describe what you need (CPU, RAM, node labels, affinity/anti-affinity rules, topology spread constraints), and the scheduler decides where to place the workload, continuously re-evaluating as cluster state changes.

# Inspect how the scheduler sees your nodes — real capacity vs. allocatable
kubectl describe nodes | grep -A 8 "Capacity:\|Allocatable:\|Non-terminated Pods:"

# Check current resource pressure on each node
kubectl top nodes

# See WHERE a specific Pod was scheduled and WHY
kubectl get pod <pod-name> -o wide          # shows the node
kubectl describe pod <pod-name> | grep -A 5 "Events:"   # scheduler decision log

Key idea — declarative vs. imperative: You never tell Kubernetes "run this container on node-07." You tell it "run 3 replicas of this image, each needing 500m CPU and 512Mi RAM." The scheduler figures out placement. This separation of intent from implementation is the foundation of everything Kubernetes does.

Problem 2: Self-Healing — Surviving Failures at Scale

At Google scale, hardware failure is not an edge case — it is the default mode of operation. A cluster with 10,000 nodes statistically loses multiple machines every single day. On a smaller team with 50 nodes, you still lose nodes to kernel panics, OOM kills, disk corruption, and cloud provider maintenance windows. Without orchestration, a failed container stays dead until a human notices and restarts it. At 3 AM, that might be 45 minutes.

Kubernetes solves this through a reconciliation loop (covered in depth in Lesson 8). Every controller — the ReplicaSet controller, the Deployment controller, the DaemonSet controller — continuously compares the desired state you declared against the actual state of the cluster. If actual differs from desired, it acts: it reschedules failed Pods, replaces unhealthy containers, and redistributes work to healthy nodes automatically.

The self-healing stack has multiple layers:

Liveness probes: Kubernetes restarts a container if its liveness probe fails — catching deadlocks and infinite loops that leave the process alive but non-functional.
Readiness probes: Traffic is only routed to a Pod once its readiness probe passes — preventing a newly started (but not yet warm) container from receiving production requests it cannot handle.
Node failure: If a node stops reporting to the control plane, Kubernetes evicts its Pods and reschedules them on healthy nodes within the configured pod-eviction-timeout (default 5 minutes in most managed clusters).
PodDisruptionBudgets (PDBs): Define how many replicas can be unavailable simultaneously — preventing an aggressive rolling update or node drain from taking down a service entirely.

When Node 3 fails, the Kubernetes control plane detects the NotReady state and reschedules its Pods onto the remaining healthy nodes automatically.

Problem 3: Scaling — Matching Capacity to Demand

Traffic is never flat. A payment service might handle 50 requests/second at 2 AM and 8,000 requests/second at Black Friday peak. Manually adjusting replica counts — or worse, always provisioning for peak — is either operationally unmanageable or extremely wasteful.

Kubernetes provides three complementary scaling mechanisms:

Horizontal Pod Autoscaler (HPA): Automatically increases or decreases the number of Pod replicas based on CPU utilisation, memory, or custom metrics from Prometheus. At Google, HPA runs on nearly every production Deployment — it is the primary mechanism for handling traffic spikes without manual intervention.
Vertical Pod Autoscaler (VPA): Adjusts the CPU and memory requests of existing Pods based on observed actual usage — crucial for right-sizing workloads without wasting reserved capacity.
Cluster Autoscaler (CA): When the scheduler cannot place a Pod because no existing node has enough capacity, the CA provisions a new cloud VM (EC2, GCE, Azure VM) and adds it to the cluster. When nodes are under-utilised, it drains and terminates them. This integrates directly with AWS Auto Scaling Groups, GKE node pools, and AKS node pools.

# Create an HPA that targets 60% average CPU utilisation across replicas
# Scales between 2 and 20 replicas automatically
kubectl autoscale deployment api-server \
  --cpu-percent=60 \
  --min=2 \
  --max=20

# Watch the HPA in action during a load test
kubectl get hpa api-server --watch

# Or define it declaratively (preferred — version-controlled)
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 75

Pro practice: Always set both requests and limits on every container. HPA requires requests to calculate utilisation percentages. Omitting requests means HPA has no baseline to scale from — it will do nothing. Omitting limits means a runaway process can starve all other Pods on the same node (a class of production incident called "noisy neighbour"). At a big-tech standard, a manifest that omits requests and limits fails code review.

Problem 4: Service Discovery — How Does A Find B in a Dynamic Cluster?

In a pre-container world, services had fixed IP addresses. You could hardcode db.internal:5432 and it would work for years. In Kubernetes, Pods are ephemeral: they crash and restart with new IP addresses, they are scaled up and down dynamically, and they are rescheduled onto different nodes. At any moment, 3 to 20 replicas of your API service might be running across 10 different nodes — each with a different cluster-internal IP.

Kubernetes solves this with a Service object: a stable virtual IP (called the ClusterIP) and DNS name that load-balances traffic across all healthy Pods matching a label selector. The DNS name is registered automatically in kube-dns (or CoreDNS): a service named payments in the checkout namespace is reachable at payments.checkout.svc.cluster.local from any Pod in the cluster, regardless of where either Pod is running or whether either has been restarted since the connection was first made.

# A minimal Service definition
# The Service finds Pods by matching labels (app: payments) — no IP hardcoding
apiVersion: v1
kind: Service
metadata:
  name: payments
  namespace: checkout
spec:
  selector:
    app: payments        # selects all Pods with this label
  ports:
  - port: 80
    targetPort: 8080     # the port your container actually listens on
  type: ClusterIP        # stable virtual IP, cluster-internal only

---
# Any other Pod in the cluster can now reach payments via DNS:
# curl http://payments.checkout.svc.cluster.local/health
# Or just: curl http://payments.checkout/health  (within same namespace chain)

# Verify DNS resolution from inside a debug Pod
kubectl run -it --rm debug --image=nicolaka/netshoot -- bash
# inside the container:
# nslookup payments.checkout.svc.cluster.local
# curl http://payments.checkout/health

Production pitfall: The ClusterIP is a virtual IP implemented in iptables (or ipvs) rules — it does not correspond to any real network interface. If you ping a ClusterIP, it will time out even when the service is healthy, because ICMP is not redirected through the iptables NAT rules. Always test service connectivity with curl or a direct TCP connection, not ping. This catches out engineers migrating from traditional VM networking every time.

Why Not Just Use Docker Compose or a Simple Script?

A common question from engineers early in their Kubernetes journey: "We already have Docker Compose in production — what does Kubernetes actually add?" The honest answer is: very little on a single host, and everything at scale.

Docker Compose is single-host. It has no concept of spreading workloads across multiple machines, no built-in node failure recovery, and no cluster-level resource accounting.
Scripted restarts do not self-heal reliably. A restart: always policy restarts a crashed container — but if the host fails, nothing restarts anything. If the container is in a crash loop, the script keeps restarting it and hiding the problem rather than surfacing it.
Manual scaling is a toil tax. Every spike becomes a manual intervention. At the scale of any real production system, this is unsustainable.
DNS and service discovery do not exist natively in Compose. You either hardcode IPs or build your own service-registry logic.

Kubernetes is complex — its learning curve is real and steep. But the complexity is accidental only at the surface. The underlying model (declare desired state, let controllers reconcile) is simple and powerful. Every concept in this tutorial is a direct answer to a specific operational problem that every team hits when running containers in production at any meaningful scale.

Big-tech context: Google has been running a Kubernetes-like system (Borg, then Omega) internally since approximately 2003. The container orchestration problems this lesson describes are not theoretical — they are the exact problems Google, Uber, Airbnb, Spotify, and Netflix all solved before contributing to or adopting Kubernetes. Every concept in the upcoming lessons maps directly to a battle-tested production decision made at those companies.

What Comes Next

Now that you understand why Kubernetes exists and what problems it solves, the remaining lessons in this tutorial build out the full picture:

Lesson 2 — Cluster Architecture: The control plane components (API Server, etcd, Scheduler, Controller Manager) and worker node components (kubelet, kube-proxy, container runtime).
Lesson 3 — Pods: The atomic unit of scheduling — what they are, how they differ from containers, and how to write a correct Pod spec.
Lessons 4-6 build on Pods with the higher-level objects: kubectl, ReplicaSets, Deployments, and Services.
Lesson 10 synthesises everything into a real end-to-end deployment of a production-grade application on Kubernetes.