What you will have at the end
A PodDisruptionBudget (PDB) applied to your workload that prevents kubectl drain, Cluster Autoscaler scale-down, and cloud provider node upgrades from evicting too many pods at once. You will know which field to pick for your workload type and how to avoid the misconfigurations that deadlock drain operations.
Prerequisites
kubectlconnected to a Kubernetes 1.27+ cluster (policy/v1PDB API is stable since 1.21; theunhealthyPodEvictionPolicyfield became usable in production from 1.27)- A Deployment or StatefulSet with at least 2 replicas (single-replica workloads and PDBs do not mix; see the common mistakes section)
- Familiarity with rolling updates and zero-downtime deployments
Voluntary vs. involuntary disruptions
PDBs only protect against voluntary disruptions. That distinction is worth understanding before writing any YAML.
Voluntary disruptions are actions where an operator or controller deliberately removes pods:
kubectl drainfor node maintenance- Cluster Autoscaler or Karpenter consolidating underutilized nodes
- Cloud provider node pool upgrades (AKS, GKE, EKS)
- Manual
kubectl delete pod(though this bypasses PDBs because it skips the Eviction API)
Involuntary disruptions are unplanned failures that Kubernetes cannot control: hardware failures, kernel panics, VM disappearances, spot instance interruptions, out-of-memory kills. PDBs have no authority over these. A node crash that takes three pods with it will not ask permission first.
The subtle part: involuntary disruptions count against the budget. If a node failure already reduced your healthy pods below desiredHealthy, a concurrent kubectl drain on a different node will be blocked until replacements are running.
Create a PDB: minAvailable vs. maxUnavailable
A PDB spec requires exactly one of two mutually exclusive fields:
| Field | Meaning | Rounding (percentage) |
|---|---|---|
minAvailable |
Minimum pods that must remain available after eviction | Rounds up (conservative: protects more) |
maxUnavailable |
Maximum pods that can be unavailable after eviction | Rounds up (permissive: allows more disruptions) |
Both accept an integer or a percentage string like "25%".
Step 1: choose your field
For stateless services (web APIs, workers, microservices), use maxUnavailable. The official Kubernetes docs recommend it because it scales naturally with replica count changes. Statsig documented switching from minAvailable to maxUnavailable after discovering that services with fewer than five pods were stuck at disruptionsAllowed: 0 permanently.
For quorum-based stateful systems (etcd, ZooKeeper, Consul), use minAvailable set to the quorum size. A 3-node etcd cluster needs minAvailable: 2 (quorum = floor(3/2) + 1 = 2). The required number of healthy members is fixed regardless of total replicas.
Step 2: write the PDB manifest
Stateless API with maxUnavailable:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-api-pdb
namespace: production
spec:
maxUnavailable: 1 # one pod at a time
unhealthyPodEvictionPolicy: AlwaysAllow # prevents CrashLoopBackOff deadlock (k8s 1.26+)
selector:
matchLabels:
app: web-api
Quorum-based StatefulSet with minAvailable:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: zookeeper-pdb
namespace: production
spec:
minAvailable: 2 # quorum for a 3-node ensemble
unhealthyPodEvictionPolicy: IfHealthyBudget # conservative for stateful workloads
selector:
matchLabels:
app: zookeeper
Step 3: apply and verify
kubectl apply -f pdb.yaml
kubectl get pdb -n production
# NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
# web-api-pdb N/A 1 2 5s
The ALLOWED DISRUPTIONS column is the number the Eviction API checks before approving or rejecting an eviction request. If it reads 0, no voluntary disruption can proceed.
To inspect the full status:
kubectl get pdb web-api-pdb -n production -o jsonpath='{.status}' | jq .
# {
# "currentHealthy": 3,
# "desiredHealthy": 2,
# "disruptionsAllowed": 1,
# "expectedPods": 3
# }
Handle unhealthy pods with unhealthyPodEvictionPolicy
Before Kubernetes 1.26, a pod stuck in CrashLoopBackOff was still protected by its PDB. The pod was broken and not serving traffic, but evicting it would drop the count below desiredHealthy. The result: kubectl drain hangs indefinitely waiting for a pod that will never become healthy.
Two policies exist:
IfHealthyBudget(default): unhealthy pods can only be evicted if the overall application is not disrupted (currentHealthy >= desiredHealthy). Conservative, but risks deadlock when all pods are unhealthy.AlwaysAllow: unhealthy pods (those without a Ready condition) can always be evicted regardless of the budget. The Kubernetes docs now recommend this for most workloads.
Set AlwaysAllow unless you run a stateful system where even a partially broken pod contributes to data availability.
How PDBs interact with kubectl drain
When you run kubectl drain, the following happens step by step:
- The node is cordoned (marked
Unschedulable). - For each pod,
kubectlsends an eviction request to the Kubernetes Eviction API. - The Eviction API checks the PDB. If evicting the pod would violate the budget, it returns HTTP 429 (Too Many Requests).
kubectl drainretries rejected requests until success or timeout.
If ALLOWED DISRUPTIONS is 0, drain blocks indefinitely. Cloud providers enforce their own timeouts: AKS times out after 1 hour with UpgradeFailed / PodDrainFailure, GKE force-drains after 1 hour, and EKS fails the upgrade after 50 minutes.
Before initiating a cluster upgrade, check for blocked PDBs:
kubectl get pdb --all-namespaces -o wide | grep ' 0 '
# Any row showing ALLOWED DISRUPTIONS = 0 will block the upgrade
Escape hatches for stuck drains
# Bypass PDBs entirely (use with caution)
kubectl drain <node> --ignore-daemonsets --disable-eviction
# --disable-eviction (k8s 1.18+) forces direct deletion instead of eviction
# Drain with timeout so it does not block indefinitely
kubectl drain <node> --ignore-daemonsets --timeout=300s
--disable-eviction skips PDB checks completely. Use it only when you understand the availability impact and other remediation options are exhausted. For the full cordon-drain-uncordon workflow, the flags every drain command needs, and how managed Kubernetes services differ in their drain timeouts, see Kubernetes node drain and cordon: safe maintenance without downtime.
How PDBs interact with Cluster Autoscaler
The Cluster Autoscaler respects PDBs during scale-down. Before marking a node for termination, it checks whether evicting its pods would violate any PDB. If the answer is yes, the node is marked "not removable" and scale-down is skipped.
Common configurations that block scale-down:
maxUnavailable: 0on any PDB matching a pod on the underutilized nodeminAvailableequal to the current replica count, producingdisruptionsAllowed: 0- A single-replica Deployment with any restrictive PDB
If Cluster Autoscaler is not scaling down and you suspect PDB interference, check for PDBs at zero:
kubectl get pdb --all-namespaces -o wide
# Look for ALLOWED DISRUPTIONS = 0 on the workloads running on the stuck node
For Karpenter users: Karpenter's voluntary disruption methods (consolidation, drift, expiration) also respect PDBs. If any PDB on any pod on a node is blocking, Karpenter will not consolidate that node. Note that Karpenter NodePool Disruption Budgets are a separate, complementary system that rate-limits node-level disruptions, not pod-level availability.
PDBs and rolling updates: separate layers
A common misconception: PDBs do not constrain Deployment rolling updates. The official docs state it clearly: "workload resources (such as Deployment and StatefulSet) are not limited by PodDisruptionBudgets when doing rolling updates."
The Deployment controller's .spec.strategy.rollingUpdate.maxUnavailable and maxSurge govern rollout behavior. PDBs govern voluntary evictions from external operations (drain, autoscaler). They are separate, complementary layers:
Deployment strategy → controls rolling update behavior (new version rollout)
PDB → controls voluntary eviction behavior (drain, autoscaler)
A good pairing for a production service:
# Deployment: zero-downtime rollout
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # spin up 1 new pod before removing an old one
maxUnavailable: 0 # never reduce below desired count during rollout
---
# PDB: safe infrastructure maintenance
apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
maxUnavailable: 1 # allow 1 pod to be drained at a time
unhealthyPodEvictionPolicy: AlwaysAllow
selector:
matchLabels:
app: myapp
If a kubectl drain and a rolling update happen simultaneously, the combined unavailability from both is checked against the PDB. The PDB does not prevent the rolling update itself, but it does prevent drain from making the situation worse.
Common mistakes
PDB on a single-replica Deployment
A PDB with minAvailable: 1 (or maxUnavailable: 0) on a single-replica Deployment produces disruptionsAllowed: 0 permanently. Drain blocks forever, Cluster Autoscaler cannot scale down, Karpenter cannot consolidate. Either run 2+ replicas or skip the PDB entirely for that workload.
maxUnavailable: 0 or minAvailable: 100%
Both configurations block all voluntary evictions. Cluster upgrades on AKS, GKE, and EKS will time out and fail. Use this only if you have coordinated out-of-band maintenance procedures and understand that the cluster cannot self-heal node pools.
Overlapping selectors across multiple PDBs
If two PDBs select the same pod, the Eviction API returns HTTP 500 instead of 429. Drain fails in an unexpected way. Each PDB should cover a unique set of pods.
Empty selector in policy/v1
In policy/v1beta1 (removed in Kubernetes 1.25), an empty selector {} matched zero pods. In policy/v1, an empty selector matches every pod in the namespace. Migrating a PDB manifest without updating the selector can unintentionally lock the entire namespace.
Ignoring the HPA interaction
The Horizontal Pod Autoscaler does not consult PDBs when scaling down replicas. HPA can reduce the replica count below the PDB's minAvailable, which sets disruptionsAllowed to 0 and blocks any concurrent drain operation until HPA scales back up. Monitor ALLOWED DISRUPTIONS after HPA scale-down events.
Verify the result
After applying your PDB, confirm it works:
# Check current state
kubectl get pdb -n production -o wide
# ALLOWED DISRUPTIONS should be >= 1
# Simulate a drain on a non-critical node (cordon first, review, then drain)
kubectl cordon <node-name>
kubectl drain <node-name> --ignore-daemonsets --dry-run=client
# The dry run shows which pods would be evicted and whether PDBs block it
# If satisfied, proceed with the real drain
kubectl drain <node-name> --ignore-daemonsets
After the drain completes, verify that the evicted pods were rescheduled and the service remained available throughout.
When to escalate
If drain remains stuck after reviewing PDB configuration, collect the following before asking for help:
kubectl get pdb --all-namespaces -o wide(full PDB status)kubectl describe pdb <name> -n <namespace>(events and conditions)kubectl get events --field-selector reason=EvictionBlocked(eviction-specific events)- Kubernetes version (
kubectl version --short) - Cloud provider and managed service tier (AKS, GKE, EKS)
- Whether
unhealthyPodEvictionPolicyis set and to which value - Number of replicas and their Ready status (
kubectl get pods -l app=<label> -o wide)