Kubernetes Networking & Storage

Volumes & PersistentVolumes

18 min Lesson 7 of 31

Volumes & PersistentVolumes

Every container in Kubernetes starts with an empty, ephemeral filesystem layered on top of its image. When the container exits — by crash, OOM kill, or rolling update — that filesystem is gone. For stateless services this is a feature, not a bug. But the moment you run a database, a message broker, a machine-learning checkpoint store, or any workload whose value lives in data written to disk, you need storage that outlives the container and often the Pod itself. Kubernetes models storage at three distinct abstraction levels — the raw Volume, the cluster-level PersistentVolume (PV), and the user-level claim against it, the PersistentVolumeClaim (PVC). Understanding where each fits, and why the binding model is designed the way it is, is prerequisite knowledge for running production stateful workloads at scale.

Ephemeral Volumes: Lifetime Tied to the Pod

A Volume in Kubernetes is not a PersistentVolume — it is a directory made available inside containers of a Pod, with a lifetime scoped to the Pod. When the Pod is deleted, the volume is torn down. Ephemeral volumes are useful for exactly three patterns:

emptyDir — an empty directory created when the Pod starts, backed by the node's disk (or memory with medium: Memory). Ideal for scratch space, sidecar-shared caches, and multi-container communication within a Pod. Used heavily for read-through caches in Envoy-based service meshes.
configMap / secret — mounts API objects as files. A secret volume mounted at /etc/tls gives containers access to TLS certificates without baking them into the image. These are also ephemeral in that they follow the Pod.
projected — combines multiple sources (configMap, secret, serviceAccountToken, downwardAPI) into a single mount point. The standard way to expose a short-lived, auto-rotated service account token to a container in modern clusters (replaces the old mounted secret approach).

emptyDir backed by memory (medium: Memory) counts against the container's memory limit. If your init container writes 500 MiB of decompressed data to a memory-backed emptyDir and your container limit is 512 MiB, the Pod will be OOMKilled before it even starts its main workload. Always set explicit sizeLimit on emptyDir volumes in production manifests.

PersistentVolumes: Cluster-Level Storage Resources

A PersistentVolume (PV) is a cluster-scoped resource that represents a piece of storage — an AWS EBS volume, a GCP Persistent Disk, an NFS share, a Ceph RBD image — that has been provisioned and registered with Kubernetes. Think of a PV the way you think of a Node: it is a resource in the cluster inventory, independent of any particular workload. A PV encodes four critical properties:

Capacity — the storage size (storage: 50Gi).
Access modes — how many nodes and in what mode can mount the volume (see below).
Reclaim policy — what happens to the underlying storage when the PVC is deleted (Retain, Delete, or the deprecated Recycle).
VolumeMode — Filesystem (default, mounted as a directory) or Block (raw block device, used by databases that manage their own I/O like Cassandra or some PostgreSQL configurations).

Access Modes — the Most Misunderstood Field

Access modes define the contract between the storage backend and the cluster scheduler. There are four modes defined by the API:

ReadWriteOnce (RWO) — the volume can be mounted read-write by a single node at a time. This is the mode supported by all block-storage backends (EBS, GCP PD, Azure Disk). RWO does not mean single Pod — multiple Pods on the same node can mount it.
ReadOnlyMany (ROX) — the volume can be mounted read-only by many nodes simultaneously. Useful for distributing read-only reference data (ML model weights, static asset bundles) across a fleet.
ReadWriteMany (RWX) — the volume can be mounted read-write by many nodes simultaneously. Only shared filesystems support this: NFS, AWS EFS, Azure Files, GCP Filestore, CephFS. Block storage backends do not support RWX.
ReadWriteOncePod (RWOP) — introduced in Kubernetes 1.22, this is a stricter variant of RWO that enforces single-Pod semantics at the API level, not just single-node. RWOP is the correct choice for a primary database volume where two Pods racing to mount the same volume would cause split-brain or data corruption.

RWO does not protect against split-brain. If a node becomes network-partitioned but not deleted, the kubelet on that node may keep the Pod running with the RWO volume mounted. When Kubernetes force-deletes the Pod and reschedules it on a healthy node, both the old (stuck) Pod and the new Pod may momentarily hold the volume. For databases like PostgreSQL or etcd, this is a data-corruption scenario. Always pair RWO volumes for databases with ReadWriteOncePod (Kubernetes 1.29+ stable) and configure pod disruption budgets carefully.

The PV/PVC binding lifecycle: a PV (cluster resource) and PVC (namespace request) are matched by the PV Controller; the Pod then mounts the bound PVC.

PersistentVolumeClaims: Portable Storage Requests

A PersistentVolumeClaim (PVC) is a namespace-scoped request for storage. A developer writes a PVC specifying the minimum size, access mode, and (optionally) a storageClassName. The PV controller in the control plane scans available PVs and binds the first one that satisfies all three criteria. The binding is exclusive and one-to-one: once a PV is bound to a PVC, no other PVC can bind it. This is a critical design property — it means a 100Gi PV will be consumed entirely by a 10Gi PVC if that is the only available match, wasting 90Gi. StorageClasses and dynamic provisioning (next lesson) solve this by creating right-sized PVs on demand.

# Static PV manifest (admin pre-provisions the EBS volume)
apiVersion: v1
kind: PersistentVolume
metadata:
  name: postgres-pv-01
spec:
  capacity:
    storage: 100Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain    # keep EBS volume on PVC delete
  storageClassName: gp3-retain             # must match PVC storageClassName
  awsElasticBlockStore:
    volumeID: vol-0abc123def456789a        # pre-existing EBS volume ID
    fsType: ext4
---
# PVC manifest (developer / application team)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
  namespace: production
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: gp3-retain
---
# Pod consuming the PVC
apiVersion: v1
kind: Pod
metadata:
  name: postgres
  namespace: production
spec:
  containers:
    - name: postgres
      image: postgres:16
      env:
        - name: PGDATA
          value: /var/lib/postgresql/data/pgdata
      volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: postgres-data     # references the PVC by name

Reclaim Policies and What Actually Happens to Your Data

The persistentVolumeReclaimPolicy field determines what the cluster does with the underlying storage resource when the PVC is deleted:

Retain — the PV is not deleted and is not made available for rebinding. It enters Released state. An administrator must manually inspect the data, optionally take a snapshot, then delete the PV object to release the underlying storage. This is the correct policy for production databases.
Delete — the PV object and the underlying storage asset (EBS volume, GCP PD, etc.) are deleted automatically when the PVC is deleted. This is the default for dynamically provisioned PVs. Safe for stateless scratch storage; dangerous for databases.
Recycle — deprecated since Kubernetes 1.11 and removed in 1.25. Do not use it.

# Inspect PV status and reclaim policy
kubectl get pv -o wide
# NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM
# postgres-pv-01    100Gi      RWO            Retain           Bound    production/postgres-data

# After deleting the PVC, check the PV status:
kubectl delete pvc postgres-data -n production
kubectl get pv postgres-pv-01
# STATUS will be "Released" — data is safe, PV is unusable until manually cleaned

# To manually reclaim a Released PV (after confirming data is safe/snapshotted):
kubectl patch pv postgres-pv-01 -p '{"spec":{"claimRef": null}}'
# PV returns to Available; it can now bind a new PVC
# (Only do this after verifying the old data is gone or irrelevant)

# Check PVC binding status and which PV it is bound to:
kubectl get pvc -n production
# NAME            STATUS   VOLUME           CAPACITY   ACCESS MODES
# postgres-data   Bound    postgres-pv-01   100Gi      RWO

Volume expansion without downtime: If a PVC is running out of space, edit the PVC's spec.resources.requests.storage to a larger value — you cannot shrink a PVC. The CSI driver will expand the filesystem online (for most block-storage drivers on Kubernetes 1.24+) without restarting the Pod. Always enable allowVolumeExpansion: true in your StorageClass (covered next lesson). Monitor actual disk usage with kubectl exec + df -h or expose it via the kubelet_volume_stats_* metrics in Prometheus, and alert at 80% capacity to give yourself time to expand before hitting the limit.

Volume Subpaths: Sharing One Volume Across Multiple Uses

A common production pattern for small workloads is to mount a single PVC into multiple paths within a container using subPath. For example, a single NFS PV might provide separate directories for application logs, uploads, and configuration backups for a small team. Use subPath with caution: changes to a mounted ConfigMap or Secret do not propagate to containers using subPath mounts (this is a known Kubernetes limitation — the inode is pinned at mount time). For ConfigMap/Secret rotation, prefer a full mount and read the file path.

Static provisioning vs dynamic provisioning: everything in this lesson is static provisioning — an administrator manually creates PV objects. Static provisioning is still used when you need precise control over which physical storage backs a critical workload (specific EBS volume in a known AZ, a specific Ceph RBD image already populated with data). Dynamic provisioning via StorageClasses — which eliminates the admin-created PV entirely — is the standard for most workloads and is covered in the next lesson.

Production Failure Modes to Internalize

Storage failures are among the most operationally damaging in Kubernetes because they often surface silently. The most common patterns at scale:

PVC stuck in Pending — no PV satisfies the claim. Check that capacity, access mode, and storageClassName all match an available PV. kubectl describe pvc <name> will show the binding failure reason.
Node-level volume attach timeout — especially with EBS: when a Pod is rescheduled after a node failure, the EBS detach/re-attach cycle can take 60–90 seconds. During this window the new Pod is stuck in ContainerCreating. Mitigate with node termination handlers (AWS Node Termination Handler) that proactively detach volumes before the instance is terminated.
Full volume kills the process — a write to a full ext4 filesystem returns ENOSPC, and most databases (PostgreSQL, MySQL) crash rather than degrade gracefully. Monitor at 80%; expand before hitting the limit. Consider using fsGroup and setting filesystem reserved blocks to 0 (tune2fs -m 0) on database volumes since the reserved 5% has no value for a DB volume owned by a single process.