We are still cooking the magic in the way!
StorageClasses & Dynamic Provisioning
StorageClasses & Dynamic Provisioning
In the previous lesson you learned that PersistentVolumes and PersistentVolumeClaims decouple pod definitions from the physical storage underneath. But statically pre-provisioning a PV for every application team that asks for storage is operationally unsustainable at scale — it requires a human in the loop for every new database or cache deployment. Dynamic provisioning solves this: a developer submits a PersistentVolumeClaim describing the size and characteristics they need, and a provisioner — a controller running in the cluster — creates the backing storage asset automatically, binds it to a freshly minted PV, and returns a ready-to-mount volume. The API object that governs this behavior is the StorageClass.
Anatomy of a StorageClass
A StorageClass is a cluster-scoped resource (no namespace) that names three things: the provisioner (which driver creates the volume), the parameters (driver-specific configuration like disk type, IOPS tier, or encryption key), and the reclaimPolicy (what happens to the underlying storage asset when the PVC that owns it is deleted).
storageClassName receives the cluster default. On EKS the default out of the box is an older gp2 StorageClass — it predates gp3 and costs more per GB for worse baseline performance. One of the first infrastructure changes at every EKS shop should be setting a gp3 StorageClass as the default and annotating the gp2 one as non-default. Failure to do this results in every team quietly burning money on gp2 volumes.Provisioners: In-Tree vs. CSI
Historically Kubernetes shipped volume drivers baked into the controller-manager binary (called "in-tree" provisioners, e.g. kubernetes.io/aws-ebs). These are now deprecated and removed from recent Kubernetes versions. The modern replacement is the Container Storage Interface (CSI) — a vendor-neutral gRPC spec that lets storage vendors ship their drivers as ordinary Kubernetes workloads (Deployments and DaemonSets), fully decoupled from the Kubernetes release cycle. Every production cluster today should use CSI drivers exclusively.
Common CSI provisioners you will encounter:
ebs.csi.aws.com— AWS EBS (block)efs.csi.aws.com— AWS EFS (file,ReadWriteMany)disk.csi.azure.com— Azure Managed Diskspd.csi.storage.gke.io— GCP Persistent Diskrbd.csi.ceph.com— Ceph RBD (self-managed)driver.longhorn.io— Longhorn (self-managed, replicated block)
volumeBindingMode: The Multi-AZ Trap
This field is the most common source of storage-related production incidents on cloud clusters. It has two values:
- Immediate — the provisioner creates and binds the PV as soon as the PVC is created. This happens before any pod is scheduled, so the volume is created in an arbitrary AZ. When the scheduler then tries to place the pod, it may pick a node in a different AZ where the EBS volume is not accessible — the pod goes
Pendingforever with a volume node affinity conflict error. - WaitForFirstConsumer — PV creation is deferred until a pod that references the PVC is being scheduled. The scheduler picks the node first, then the provisioner creates the volume in the same AZ as that node. This is the only correct mode for zone-aware block storage on multi-AZ clusters.
Reclaim Policies: What Happens When a PVC Is Deleted
The reclaimPolicy on a StorageClass controls the lifecycle of the backing storage asset after the PVC that bound it is deleted. There are three values, but only two matter in practice:
- Delete (default for most cloud StorageClasses) — the CSI driver deletes the underlying asset (e.g. terminates the EBS volume) as soon as the PV is released. This is efficient but dangerous: deleting a PVC in the wrong namespace will irrecoverably destroy production data. Always pair
Delete-policy StorageClasses with PVC deletion protection (e.g. Velero backup policies or a validating webhook that prevents PVC deletion in critical namespaces). - Retain — the PV transitions to the
Releasedphase and the underlying storage asset is preserved. An administrator must manually delete the PV object and optionally clean up the backing resource. This is the correct policy for any tier that holds data you cannot afford to lose — databases, blob stores, audit logs. The recovered data can be re-attached by creating a new PV that points at the same backing resource and a new PVC with a matchingvolumeName.
kubectl patch pv <pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'. Do this before deleting the PVC.Volume Expansion: Growing a PVC Without Downtime
When a volume fills up, the correct response is not to provision a new one and migrate data — it is to expand the existing PVC in place. For this to work, three things must be true: the StorageClass must have allowVolumeExpansion: true, the CSI driver must implement the ControllerExpandVolume and NodeExpandVolume RPC calls, and (for filesystem volumes) the node-side expansion must complete while the volume is mounted.
Modern CSI drivers for all major cloud providers support online expansion — the EBS volume is resized via the AWS API and the filesystem (ext4 or XFS) is expanded without unmounting. For older or self-managed drivers that only support offline expansion, you must delete the pod first, allow the PV to detach, then patch the PVC and bring the pod back up.
kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes > 0.80 in Prometheus so you expand before the disk fills, not after the application crashes.Putting It Together: StorageClass Design at Production Scale
A mature cluster typically has three to five StorageClasses that map to distinct cost and performance tiers. A representative design for an AWS EKS cluster might look like this: a default ebs-gp3 StorageClass for general workloads, an ebs-io2 class for databases that require predictable IOPS, an efs-shared class for read-write-many workloads like ML training data or shared config mounts, and a local-nvme class (using the local-path or TopoLVM CSI) for latency-critical scratch workloads that tolerate data loss on node failure.
Every StorageClass that handles production data should have reclaimPolicy: Retain, allowVolumeExpansion: true, and volumeBindingMode: WaitForFirstConsumer. These three settings together prevent AZ mismatches, accidental data loss, and the operational pain of re-provisioning volumes for capacity increases.
VolumeSnapshotClass (analogous to a StorageClass) and take crash-consistent snapshots of PVCs with kubectl apply -f volumesnapshot.yaml. Snapshots are the foundation of backup-and-restore pipelines for stateful workloads and are a prerequisite for safely testing volume migrations or schema changes in production clusters.