Node NotReady: diagnosing Kubernetes node failures

A node in NotReady state has stopped sending heartbeats to the control plane. The kubelet is either down, unreachable, or actively reporting that a health condition has failed. Pods on the node face eviction within five minutes. This article covers how to read node conditions, diagnose the root cause (kubelet crash, container runtime failure, resource pressure, network partition, certificate expiry), and recover or replace the node safely.

What NotReady means

Every Kubernetes node runs a kubelet that sends heartbeats to the control plane through two mechanisms. A lightweight Lease object in the kube-node-lease namespace is renewed every 10 seconds. A heavier NodeStatus update writes the full condition set every 5 minutes or whenever conditions change.

The node lifecycle controller watches these heartbeats. If neither arrives within the node monitor grace period (default: 50 seconds), the controller acts:

Ready becomes Unknown: the control plane cannot reach the kubelet at all. Typical causes: kubelet crashed, node rebooted, network partition.
Ready becomes False: the kubelet is alive and reporting, but it is telling the control plane that something is wrong (resource pressure, container runtime down, CNI failure).

Both states trigger automatic taints. Unknown adds node.kubernetes.io/unreachable:NoExecute. False adds node.kubernetes.io/not-ready:NoExecute. Kubernetes automatically injects tolerations with tolerationSeconds=300 on every pod that does not set its own. That means pods on a NotReady node survive for 5 minutes before the control plane evicts them and reschedules elsewhere.

The distinction between Unknown and False matters for diagnosis. All conditions Unknown? The kubelet is dead or unreachable. A specific condition True (like DiskPressure)? The kubelet is running but the node has a concrete problem.

Reading node conditions

The kubelet reports five conditions on every node. kubectl describe node <name> shows them under the Conditions section.

Condition	Healthy value	What it signals when triggered
`Ready`	`True`	Node can accept pods
`DiskPressure`	`False`	Filesystem is low on space or inodes
`MemoryPressure`	`False`	Node is running out of available memory
`PIDPressure`	`False`	Too many processes on the node
`NetworkUnavailable`	`False`	CNI plugin has not configured the network

Start every diagnosis here:

# Get conditions for a specific node
kubectl describe node worker-3

# Extract just conditions in a clean format
kubectl get node worker-3 -o jsonpath='{range .status.conditions[*]}{.type}{": "}{.status}{" -- "}{.message}{"\n"}{end}'

# Check events for the node
kubectl get events --field-selector involvedObject.name=worker-3 --sort-by='.lastTimestamp'

Pattern recognition. All conditions Unknown with the message "Kubelet stopped posting node status" means the kubelet is completely down or the node is unreachable. A single condition set to True (e.g., MemoryPressure: True) while Ready is False means the kubelet is alive but that specific resource is exhausted.

Example output for a node under disk pressure:

Conditions:
  Type             Status  Reason                       Message
  ----             ------  ------                       -------
  DiskPressure     True    KubeletHasDiskPressure       kubelet has disk pressure
  MemoryPressure   False   KubeletHasSufficientMemory   kubelet has sufficient memory available
  PIDPressure      False   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   KubeletNotReady              Node has disk pressure

Diagnosing resource pressure

DiskPressure

The kubelet monitors filesystem usage against hard eviction thresholds. Defaults: nodefs.available < 10%, nodefs.inodesFree < 5%, imagefs.available < 15%. When a threshold is crossed, the kubelet first garbage-collects unused images and dead containers. If that is insufficient, it evicts pods.

SSH to the node (or use kubectl debug node):

# Open a debug shell on the node
kubectl debug node/worker-3 -it --image=ubuntu --profile=sysadmin
# Inside: chroot /host

# Check disk space
df -h

# Check inode usage (inode exhaustion is invisible to df -h)
df -i

# Find the largest consumers
du -sh /var/lib/containerd/* 2>/dev/null
du -sh /var/log/pods/*

# Prune unused container images manually
crictl rmi --prune

Common real-world causes: container logs accumulating in /var/log/pods/, emptyDir volumes growing unbounded, image layer accumulation without garbage collection, and inode exhaustion from many small files (even with available block space).

Verification: after cleanup, kubectl describe node worker-3 shows DiskPressure: False and Ready: True within one to two minutes.

MemoryPressure

Default hard eviction threshold: memory.available < 100Mi (some distributions ship higher defaults; check your kubelet configuration). The kubelet derives available memory from cgroupfs, not from free -m. It excludes inactive_file from "used".

# Check node memory
free -h

# Find top memory consumers
ps aux --sort=-%mem | head -20

# From the control plane: per-pod memory usage
kubectl top pod --all-namespaces --sort-by=memory | head -20

Under MemoryPressure, the kubelet evicts pods in a specific order: pods exceeding their memory requests first, then BestEffort, then Burstable, Guaranteed last. If memory drops faster than the kubelet's 10-second eviction check, the Linux OOM killer fires first and uses oom_score_adj per QoS class. For details on how memory limits and QoS interact, see OOMKilled.

PIDPressure

Less common, but it locks up a node hard. When PIDs are exhausted, no new process can fork. Even basic diagnostic commands stop working on the node.

# Check PID ceiling and current count
cat /proc/sys/kernel/pid_max
ps aux | wc -l

Fix: reduce pod count on the node, kill runaway processes, or increase kernel.pid_max via node configuration.

Kubelet and container runtime health

The kubelet runs as a systemd service on most distributions. If it stops, the node starts the 50-second countdown to NotReady.

# Check kubelet status
systemctl status kubelet

# Review recent logs
journalctl -u kubelet -n 100 --no-pager

# Search for known failure patterns
journalctl -u kubelet -n 200 | grep -E "Error|PLEG|NetworkPlugin|certificate|x509|runtime|evict"

Common kubelet log messages and their meaning:

Log message	Likely cause
`PLEG is not healthy`	Container runtime slow or deadlocked
`container runtime is not running`	containerd or CRI-O is down
`failed to run Kubelet: unable to load client CA file`	TLS certificate issue
`node lease renewal failed`	Network connectivity to API server lost
`x509: certificate has expired`	Kubelet client certificate expired

If the kubelet is simply stopped:

sudo systemctl restart kubelet
sudo systemctl enable kubelet  # ensure it starts on boot

Container runtime and PLEG

The kubelet communicates with the container runtime (containerd, CRI-O) via the Container Runtime Interface socket. If the runtime crashes, the kubelet's PLEG (Pod Lifecycle Event Generator) becomes unhealthy, and the node reports NotReady.

# Check containerd
systemctl status containerd
journalctl -u containerd -n 50

# Test the CRI socket
crictl info
crictl ps   # running containers
crictl pods # pod sandbox list

PLEG is considered unhealthy if its relist cycle exceeds 3 minutes. Causes: container runtime deadlock (most common), too many pods making relists slow, CPU saturation starving the relist loop, or disk I/O throttling on cloud VMs.

cgroup driver mismatch. If the kubelet uses systemd as cgroup driver but containerd is configured for cgroupfs, the kubelet fails. Both must match (systemd is recommended for systemd-based distributions):

# Kubelet cgroup driver
grep cgroupDriver /var/lib/kubelet/config.yaml

# containerd cgroup driver
grep -A2 "SystemdCgroup" /etc/containerd/config.toml

Evented PLEG (Kubernetes v1.27+). The EventedPLEG feature gate (beta) replaces polling with CRI event streaming. It reduces kubelet CPU overhead and eliminates PLEG timeouts caused by slow relists.

Network connectivity to the control plane

A node that cannot reach the API server on port 6443 stops renewing its Lease. After 50 seconds, the control plane declares it Unknown.

# From the worker node: test API server connectivity
nc -zv 10.0.0.10 6443
curl -k https://10.0.0.10:6443/healthz

# Check firewall rules
iptables -L INPUT -n -v
iptables -L OUTPUT -n -v

A network partition creates a specific risk: the isolated node keeps running its pods locally, but the control plane evicts them after 300 seconds and reschedules on healthy nodes. For StatefulSets, this can cause split-brain situations where the same identity runs on two nodes simultaneously.

NetworkUnavailable condition

Unlike the pressure conditions (set by the kubelet), NetworkUnavailable is set by the CNI plugin. It means the node's network has not been configured.

# Check CNI pods on the affected node
kubectl get pods -n kube-system -o wide --field-selector spec.nodeName=worker-3

# Check for specific CNI plugins
kubectl get pods -n kube-system | grep -E "calico|flannel|cilium|weave"

# On the node: verify CNI configuration files exist
ls /etc/cni/net.d/

Common fixes: restart the CNI DaemonSet pod (kubectl delete pod -n kube-system <cni-pod-name>; the DaemonSet recreates it), verify Pod CIDR does not overlap existing network ranges, and on RHEL/CentOS check that NetworkManager is not interfering with CNI routes.

Certificate expiry

Kubelet client certificates typically expire after one year. When expired, the kubelet cannot authenticate to the API server. All conditions flip to Unknown.

# Check certificate expiry on the node
openssl x509 -enddate -noout -in /var/lib/kubelet/pki/kubelet-client-current.pem

# Search kubelet logs for certificate issues
journalctl -u kubelet --since "1 hour ago" | grep -i "cert\|tls\|x509"

The typical error: x509: certificate has expired or is not yet valid.

Fix: enable automatic rotation in /var/lib/kubelet/config.yaml:

rotateCertificates: true
serverTLSBootstrap: true

Fix: approve pending CSRs manually:

kubectl get csr
kubectl certificate approve <csr-name>

Fix: renew with kubeadm:

sudo kubeadm certs renew all
sudo systemctl restart kubelet

Node recovery and drain procedures

When to recover vs replace

Recover when the root cause is identifiable and fixable: kubelet restart, disk cleanup, certificate renewal, CNI pod restart. Replace when the node is completely unresponsive, has D-state processes that will not clear, or is a terminated cloud instance.

Recovery procedure

# 1. Cordon the node to stop new scheduling
kubectl cordon worker-3

# 2. SSH or kubectl debug to fix the root cause
#    (disk cleanup, kubelet restart, certificate renewal, etc.)

# 3. Verify the node returns to Ready
kubectl describe node worker-3

# 4. Uncordon when Ready
kubectl uncordon worker-3

# 5. Watch for stability
kubectl get nodes -w

Drain procedure (planned maintenance)

kubectl drain gracefully evicts all pods, respecting PodDisruptionBudgets and termination grace periods. The full cordon-drain-uncordon workflow, every flag that drain needs, and the cloud-provider-specific timeouts on GKE, EKS, and AKS are covered in Kubernetes node drain and cordon: safe maintenance without downtime. The short version:

kubectl drain worker-3 \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --grace-period=60 \
  --timeout=300s

The flags: --ignore-daemonsets skips DaemonSet-managed pods (they restart automatically on uncordon). --delete-emptydir-data allows evicting pods with emptyDir volumes. --grace-period overrides the pod's termination grace period. --timeout caps total drain time.

If a drain blocks, check PodDisruptionBudgets:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: myapp

A drain that violates a PDB blocks until the condition is met. Only force-override with --disable-eviction as a last resort.

Forced node deletion (permanently lost node):

kubectl delete node worker-3

This removes the node object but does not gracefully evict pods. Pods in Terminating state stay until the garbage collection timer expires. Provision a replacement and let it join the cluster.

If you see a pod stuck in Pending after draining, the cluster may not have enough capacity to reschedule all evicted workloads. Check resource requests and allocatable capacity.

When to escalate

Collect this information before asking for help:

Full output of kubectl describe node <node-name>
Output of kubectl get events --field-selector involvedObject.name=<node-name>
Kubelet logs: journalctl -u kubelet -n 200 (from the node)
Container runtime logs: journalctl -u containerd -n 100
Output of kubectl get nodes -o wide and kubectl top nodes
kube-system pods on the node: kubectl get pods -n kube-system -o wide --field-selector spec.nodeName=<node-name>
Kubernetes version: kubectl version
Whether the issue is consistent or intermittent
Whether the node was recently upgraded, patched, or had a configuration change

How to prevent recurrence

Enable kubelet certificate auto-rotation (rotateCertificates: true) to avoid expiry-related outages.
Set resource requests accurately. Overcommitted nodes hit MemoryPressure and DiskPressure sooner.
Monitor node conditions with Prometheus and kube-state-metrics. The metric kube_node_status_condition fires before a human notices.
Install Node Problem Detector as a DaemonSet to surface kernel-level issues (deadlocks, read-only filesystems, frequent kubelet restarts) that the kubelet itself does not report.
Configure log rotation for container logs to prevent DiskPressure from log accumulation.
Drain one node at a time during maintenance, and always verify PodDisruptionBudgets are in place for critical workloads.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy