Pods: The Atomic Unit
Pods: The Atomic Unit
In Kubernetes, every workload — whether a stateless API server, a database, or a batch job — eventually runs inside a Pod. A Pod is not a container. It is a thin wrapper that groups one or more containers into a single schedulable unit with a shared execution environment. Understanding Pod anatomy at this level is non-negotiable: every higher-level abstraction (Deployment, StatefulSet, Job) is ultimately a factory that creates and manages Pods.
Pod Anatomy
A Pod specification (the spec section of a manifest) describes everything the scheduler and kubelet need to run your workload:
- containers — one or more container specs, each with an image, command, ports, environment variables, and resource requests/limits.
- volumes — storage volumes that any container in the Pod can mount. Volumes are scoped to the Pod lifetime.
- initContainers — containers that run to completion before any regular container starts. Used for bootstrapping: running migrations, fetching secrets, waiting for dependencies.
- restartPolicy —
Always(default, for long-running services),OnFailure(for jobs), orNever. - serviceAccountName — the RBAC identity the Pod uses to call the Kubernetes API.
- securityContext — Pod-level security: run as non-root, read-only root filesystem, syscall filters (seccomp), AppArmor profiles.
- affinity / tolerations / nodeSelector — scheduling constraints that control which Nodes the Pod may land on.
The most important architectural fact about a Pod is its shared network and IPC namespace. Every container inside the same Pod sees the exact same loopback interface (localhost), the same IP address, and the same hostname. If container A binds port 8080, container B can reach it on localhost:8080. This is by design — it enables tightly coupled helper processes (sidecars) without the overhead of a service mesh for intra-Pod communication.
Writing a Real Pod Manifest
You rarely create bare Pods in production (Deployments do that for you), but you must be able to read and write manifests to debug and to understand what higher-level objects generate. Here is a production-grade single-container Pod manifest with the fields you will encounter in real clusters:
requests is what the scheduler uses to place the Pod on a Node with enough spare capacity. limits is the hard ceiling enforced by cgroups at runtime. Setting limits without requests defaults requests to equal limits — correct behaviour. Never set limits without requests in production; it prevents the scheduler from bin-packing the cluster efficiently.
Multi-Container Pods and Sidecars
The single-container Pod is the common case, but Kubernetes explicitly supports multiple containers per Pod. The pattern is called a sidecar. A sidecar is a container that augments the main application container without modifying it. This is powerful because it respects the single-responsibility principle at the container level: your application image does one thing, and a separate team's image adds a capability (logging, metrics, mTLS) as an orthogonal concern.
The three canonical sidecar patterns used at big-tech companies:
- Log shipper — The app writes structured logs to a shared
emptyDirvolume. A Fluentd or Promtail sidecar tails that directory and forwards to a central log aggregator (Loki, Elasticsearch, Splunk). The app team owns the app image; the platform team owns the shipper image. Neither needs to know about the other's implementation. - Proxy / service mesh — Istio injects an Envoy sidecar (called the data plane) into every Pod automatically via a MutatingAdmissionWebhook. All inbound and outbound traffic flows through Envoy, giving you mTLS, retries, circuit breaking, and distributed tracing without changing a single line of application code.
- Secret sync — A Vault Agent sidecar authenticates to HashiCorp Vault, retrieves secrets, and writes them to a shared
tmpfsvolume. The app reads secrets from files rather than environment variables — a security best practice because env vars can be leaked through/proc/PID/environ.
initContainer makes the failure explicit: kubectl describe pod <name> will clearly show which init container failed and why. The main container never starts, so there is no ambiguity.
Pod Lifecycle
A Pod moves through a defined set of phases during its lifetime. These phases are reported in pod.status.phase and are what you see in kubectl get pods under the STATUS column:
- Pending — The Pod has been accepted by the API server but has not yet been scheduled to a Node, or is scheduled but its images are still being pulled.
- Running — The Pod is bound to a Node, all containers have been created, and at least one container is still running (or is in the process of starting or restarting).
- Succeeded — All containers in the Pod have exited with status code 0 and will not be restarted. This is the terminal state for Jobs.
- Failed — All containers have exited, and at least one exited with a non-zero status or was killed by the system.
- Unknown — The state of the Pod cannot be determined, typically because communication with the Node's kubelet was lost. This is a signal of a Node failure or a network partition.
Within the Running phase, individual containers have their own state: Waiting, Running, or Terminated. The reason field on a Waiting or Terminated state is the first place to look when debugging — it will tell you CrashLoopBackOff, OOMKilled, ImagePullBackOff, ContainerCreating, etc.
reason field on a container state of Waiting. It means the container has crashed repeatedly and kubelet is applying an exponential back-off delay (starting at 10s, capping at 5 minutes) before attempting to restart it again. Always run kubectl logs <pod> --previous to get the logs from the previous (crashed) container instance, not the currently-waiting one.
Probes: Liveness, Readiness, and Startup
Kubernetes cannot read your application's mind — it needs explicit signals about health. Three probe types are available:
- livenessProbe — "Is this container alive?" If it fails
failureThresholdtimes, kubelet kills and restarts the container. Use it for detecting deadlocks: a process that is running but stuck forever responding to no requests. - readinessProbe — "Is this container ready to serve traffic?" If it fails, the Pod's IP is removed from the Endpoints object of every Service that selects it. Traffic stops flowing to that Pod, but the container is not killed. Use it to signal during startup warmup or when an upstream dependency is temporarily down.
- startupProbe — For slow-starting containers (JVM apps, ML model loading). While the startup probe is running, liveness and readiness probes are disabled. This prevents premature restarts during initialization.
Inspecting Pods in Practice
The commands every DevOps engineer runs dozens of times per day: