Table of contents
- What you will have at the end
- Prerequisites
- StorageClass anatomy
- AWS EKS: install the EBS CSI driver and create a gp3 StorageClass
- GCP GKE: use the Persistent Disk CSI driver
- Azure AKS: use the built-in disk CSI driver
- Why WaitForFirstConsumer matters even on single-zone clusters
- ReadWriteMany and NFS: not automatic
- Verify the result
- Common troubleshooting
What you will have at the end
A production-ready StorageClass configured for your cloud provider, with the correct CSI driver installed, WaitForFirstConsumer binding enabled, volume expansion permitted, and a PVC that dynamically provisions a cloud disk when a pod requests it.
Prerequisites
- A running Kubernetes cluster on one of the three major clouds: Amazon EKS, Google GKE, or Azure AKS
kubectlconfigured and authenticated against the cluster- For EKS:
awsCLI andeksctlinstalled; IAM permissions to create roles and install add-ons - For GKE:
gcloudCLI authenticated - For AKS:
azCLI authenticated - Familiarity with how PVs, PVCs, and StorageClasses relate to each other. A StorageClass is the provisioning template; a PVC is the request; a PV is the actual volume that gets created. Dynamic provisioning connects them automatically.
StorageClass anatomy
A StorageClass has six fields that matter for day-to-day operations:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: example
annotations:
storageclass.kubernetes.io/is-default-class: "true" # mark as cluster default
provisioner: <csi-driver-name> # which driver creates the volume
reclaimPolicy: Delete # Delete or Retain
allowVolumeExpansion: true # grow PVCs after creation (shrink is never supported)
volumeBindingMode: WaitForFirstConsumer # delay provisioning until a pod is scheduled
parameters: # driver-specific: disk type, IOPS, encryption
type: gp3
The provisioner field determines everything. Each cloud has its own CSI driver with a specific driver name. Use the wrong name and the PVC stays Pending forever.
| Cloud | CSI driver name | What it provisions |
|---|---|---|
| AWS EKS | ebs.csi.aws.com |
EBS volumes (gp2, gp3, io1, io2) |
| AWS EKS Auto Mode | ebs.csi.eks.amazonaws.com |
EBS volumes (managed by EKS Auto Mode) |
| GCP GKE | pd.csi.storage.gke.io |
Persistent Disks (pd-balanced, pd-ssd, Hyperdisk) |
| Azure AKS | disk.csi.azure.com |
Azure Managed Disks (Standard SSD, Premium SSD, Ultra) |
The reclaimPolicy deserves a careful decision. Delete (the default) destroys the underlying cloud disk when the PVC is deleted. For stateful production workloads like databases, set it to Retain. You can patch an existing PV's reclaim policy after creation, but the StorageClass sets the default for newly provisioned volumes.
AWS EKS: install the EBS CSI driver and create a gp3 StorageClass
EKS does not include the EBS CSI driver by default. Without it, PVCs referencing ebs.csi.aws.com fail with errors like failed to provision volume with StorageClass. On EKS 1.30 and later, no default StorageClass is annotated either, so you must configure both the driver and the StorageClass.
Step 1: create the IAM role
The EBS CSI controller needs AWS permissions to create, attach, and delete EBS volumes. Create a service account role with the AmazonEBSCSIDriverPolicy managed policy:
eksctl create iamserviceaccount \
--name ebs-csi-controller-sa \
--namespace kube-system \
--cluster production-cluster \
--role-name AmazonEKS_EBS_CSI_DriverRole \
--role-only \
--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
--approve
Both EKS Pod Identities and IRSA (IAM Roles for Service Accounts) work. Pod Identities is the recommended auth method for new clusters.
Step 2: install the EBS CSI add-on
aws eks create-addon \
--cluster-name production-cluster \
--addon-name aws-ebs-csi-driver \
--service-account-role-arn arn:aws:iam::111122223333:role/AmazonEKS_EBS_CSI_DriverRole
Wait for the add-on status to reach ACTIVE:
aws eks describe-addon --cluster-name production-cluster --addon-name aws-ebs-csi-driver \
--query 'addon.status' --output text
Expected output: ACTIVE
EBS volumes require EC2 nodes. Fargate pods cannot mount EBS volumes.
Step 3: create the StorageClass
gp3 is the correct choice for new deployments: 20% cheaper per GB than gp2, with a baseline of 3,000 IOPS and 125 MiB/s throughput at any volume size. gp2 ties IOPS to volume size (3 IOPS/GiB), so small volumes get poor performance.
# storageclass-ebs-gp3.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs-gp3
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete # use Retain for production databases
allowVolumeExpansion: true
parameters:
type: gp3 # gp3 baseline: 3,000 IOPS, 125 MiB/s
iops: "3000"
throughput: "125" # MiB/s; scale up to 1,000 for gp3
encrypted: "true" # EBS encryption at rest
csi.storage.k8s.io/fstype: ext4
Apply it:
kubectl apply -f storageclass-ebs-gp3.yaml
EKS Auto Mode note: If your cluster runs EKS Auto Mode, the standard EBS CSI add-on is incompatible. EKS Auto Mode uses its own provisioner: ebs.csi.eks.amazonaws.com. Replace the provisioner field accordingly. No manual add-on installation is needed in Auto Mode.
Step 4: create a PVC and a test pod
# test-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ebs-test
spec:
accessModes:
- ReadWriteOnce
storageClassName: ebs-gp3
resources:
requests:
storage: 5Gi
---
apiVersion: v1
kind: Pod
metadata:
name: ebs-test-pod
spec:
containers:
- name: app
image: busybox:1.36
command: ["sh", "-c", "echo 'volume works' > /data/test.txt && sleep 3600"]
volumeMounts:
- mountPath: /data
name: storage
volumes:
- name: storage
persistentVolumeClaim:
claimName: ebs-test
kubectl apply -f test-pvc.yaml
The PVC stays in Pending until the pod is scheduled (that is normal with WaitForFirstConsumer). Once the pod runs:
kubectl get pvc ebs-test
Expected output:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ebs-test Bound pvc-a1b2c3d4-5678-90ab-cdef-111122223333 5Gi RWO ebs-gp3 45s
GCP GKE: use the Persistent Disk CSI driver
GKE Autopilot clusters ship with the PD CSI driver enabled and two default StorageClasses: standard-rwo (pd-balanced) and premium-rwo (pd-ssd), both using WaitForFirstConsumer. On GKE Standard clusters, verify the CSI driver is enabled; older clusters may still use the in-tree kubernetes.io/gce-pd provisioner.
The provisioner name is pd.csi.storage.gke.io.
Create a custom StorageClass
If the built-in classes do not fit (you need pd-ssd as default, or Hyperdisk, or regional PDs), create a custom StorageClass:
# storageclass-gke-ssd.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gke-ssd
provisioner: pd.csi.storage.gke.io
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Delete
parameters:
type: pd-ssd # options: pd-balanced, pd-standard, pd-ssd, pd-extreme
For regional persistent disks that replicate data across two zones:
parameters:
type: pd-balanced
replication-type: regional-pd
allowedTopologies:
- matchLabelExpressions:
- key: topology.kubernetes.io/zone
values:
- europe-west4-a
- europe-west4-b
GKE also supports Hyperdisk types (hyperdisk-balanced, hyperdisk-throughput, hyperdisk-extreme, hyperdisk-ml) through the same pd.csi.storage.gke.io provisioner. Availability is region-dependent.
Azure AKS: use the built-in disk CSI driver
AKS installs the Azure Disk CSI driver (disk.csi.azure.com) and several StorageClasses automatically. The default StorageClass (managed-csi) uses StandardSSD_LRS with WaitForFirstConsumer.
For multi-zone AKS clusters deployed on Kubernetes 1.29+, the built-in StorageClasses automatically use Zone-Redundant Storage (StandardSSD_ZRS, Premium_ZRS). ZRS provides cross-zone replication, which improves resilience but increases cost.
Create a custom StorageClass
To control the SKU, caching, or to pin LRS for cost optimization:
# storageclass-aks-premium.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: premium-lrs
provisioner: disk.csi.azure.com
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Delete
parameters:
skuName: Premium_LRS # options: Standard_LRS, Premium_LRS, StandardSSD_LRS,
# PremiumV2_LRS, UltraSSD_LRS, *_ZRS variants
cachingMode: ReadOnly # None, ReadOnly, ReadWrite
Why WaitForFirstConsumer matters even on single-zone clusters
A common misconception is that WaitForFirstConsumer only matters for multi-zone clusters. It does not. There are three reasons to use it everywhere.
Zone safety for future expansion. A single-zone cluster that grows into a second zone will break every Immediate-provisioned PVC. The volumes already exist in zone A; the scheduler might place pods on nodes in zone B. The result is volume node affinity conflict errors and pods stuck in Pending. With WaitForFirstConsumer, volumes are provisioned in the pod's zone from the start.
No orphaned volumes. Immediate provisions a disk the moment the PVC is created, even if no pod ever uses it. Orphaned cloud disks cost money. WaitForFirstConsumer only provisions when a pod actually needs the volume.
It is the default everywhere. All modern managed-cloud StorageClasses use it: EKS 1.30+, AKS managed-csi, GKE standard-rwo. The official aws-ebs-csi-driver example StorageClass uses WaitForFirstConsumer unconditionally.
One operational side effect: a PVC with WaitForFirstConsumer stays in Pending until a pod references it. This is expected behavior, not a provisioning failure. Do not set spec.nodeName directly on the pod when using WaitForFirstConsumer; this bypasses the scheduler and leaves the PVC permanently stuck. Use nodeSelector or node affinity instead.
ReadWriteMany and NFS: not automatic
Block storage (EBS, Azure Disk, GCP PD) supports ReadWriteOnce only. A single node at a time. If you need multiple pods on different nodes writing to the same volume (ReadWriteMany), you need a file-based storage backend.
Having an NFS server on your network is not enough. Kubernetes does not dynamically provision NFS volumes without an explicit CSI driver. The NFS CSI driver (nfs.csi.k8s.io) must be installed separately, typically via Helm.
Cloud-managed file services also require their own drivers:
| Service | CSI driver | Pre-installed? |
|---|---|---|
| AWS EFS | efs.csi.aws.com |
No, requires aws-efs-csi-driver add-on |
| Azure Files | file.csi.azure.com |
Yes, pre-installed on AKS |
| GCP Filestore | filestore.csi.storage.gke.io |
No, requires explicit enablement |
Azure Files is the only cloud file service with a pre-installed CSI driver. AKS ships azurefile-csi and azurefile-csi-premium StorageClasses out of the box.
Verify the result
After applying your StorageClass, confirm it exists and check the default annotation:
kubectl get storageclass
Expected output (EKS example):
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
ebs-gp3 (default) ebs.csi.aws.com Delete WaitForFirstConsumer true 2m
Create a PVC and a pod (use the test manifests from the EKS section above, adjusting storageClassName). Confirm the PVC reaches Bound and the pod starts:
kubectl get pvc
kubectl get pod
If the PVC stays in Pending after the pod is scheduled, check events:
kubectl describe pvc <pvc-name>
The events section tells you exactly what went wrong: missing CSI driver, IAM permission error, or zone conflict.
Common troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
PVC stuck in Pending, no pod exists |
WaitForFirstConsumer working as designed |
Create a pod that references the PVC |
PVC stuck in Pending with pod running |
spec.nodeName set directly on the pod |
Use nodeSelector or node affinity instead |
UnauthorizedOperation on volume creation |
CSI driver lacks IAM/RBAC permissions | Attach AmazonEBSCSIDriverPolicy (EKS) or verify workload identity (GKE/AKS) |
volume node affinity conflict |
Volume provisioned in wrong zone | Switch StorageClass to WaitForFirstConsumer |
Multi-Attach error |
Block storage used with ReadWriteMany | Block storage is RWO only; use EFS, Azure Files, or NFS for RWX |
| EBS CSI error on EKS Auto Mode | Using ebs.csi.aws.com provisioner |
Switch to ebs.csi.eks.amazonaws.com |
| PVC created, volume not cleaned up on delete | reclaimPolicy: Retain in effect |
Manual PV and cloud disk cleanup required |