What CrashLoopBackOff actually means
CrashLoopBackOff is a kubelet display status, not an official pod phase. The actual pod phase stays Running or transitions to Failed. What the status tells you: a container started, exited, and the kubelet is restarting it with exponential backoff delays.
The backoff sequence (Kubernetes 1.32 and earlier):
| Restart attempt | Wait before next attempt |
|---|---|
| 1 | 10 seconds |
| 2 | 20 seconds |
| 3 | 40 seconds |
| 4 | 80 seconds |
| 5 | 160 seconds |
| 6+ | 300 seconds (capped) |
If the container runs successfully for 10 consecutive minutes, the backoff resets to 10 seconds.
During each wait window, kubectl get pods shows CrashLoopBackOff. When the kubelet actually attempts a restart, the status briefly flips to Running or ContainerCreating before the container crashes again.
Kubernetes 1.33 alpha change. The ReduceDefaultCrashLoopBackOffDecay feature gate (KEP-4603, opt-in, disabled by default) drops the initial delay to 1 second and caps at 60 seconds. If you are on 1.33 with this gate enabled, your pods recover faster from transient failures but the diagnostic approach stays the same.
Reading the symptoms
Three commands, in this order.
Step 1: confirm the crash loop
kubectl get pods -n <namespace>
Expected output for a crashing pod:
NAME READY STATUS RESTARTS AGE
payment-svc-7b9f6d-xk2p 0/1 CrashLoopBackOff 14 (2m ago) 47m
High RESTARTS, 0/1 under READY, and CrashLoopBackOff under STATUS confirm the loop.
Step 2: read exit codes and events
kubectl describe pod payment-svc-7b9f6d-xk2p -n payments
Look for two sections in the output.
Container state:
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 09 Apr 2026 14:03:10 +0000
Finished: Wed, 09 Apr 2026 14:03:11 +0000
The exit code is your first branch point. See the exit code reference table below.
Events:
Warning BackOff 2m (x47 over 30m) kubelet Back-off restarting failed container
If you see Liveness probe failed in the events instead, the container is not crashing on its own. Kubernetes is killing it. That is a different root cause.
Step 3: read the previous container's logs
kubectl logs payment-svc-7b9f6d-xk2p -n payments --previous
The --previous flag (or -p) retrieves logs from the last terminated container instance. Without it, you get the current container's logs, which are often empty because the container just started or has not produced output yet.
For multi-container pods:
kubectl logs payment-svc-7b9f6d-xk2p -n payments -c payment-worker --previous
If --previous returns empty logs, the container crashed before writing to stdout/stderr. That points to a missing binary (exit code 127), a shared library failure, or an OOM kill in the first milliseconds. Skip to debugging with ephemeral containers.
Exit code reference
Exit codes appear in kubectl describe pod under Last State > Exit Code. The code tells you what category of failure you are dealing with.
| Exit code | Signal | What it means | Most likely Kubernetes cause |
|---|---|---|---|
| 0 | none | Successful exit | Container completed a task and stopped; restartPolicy: Always keeps restarting it |
| 1 | none | Application error | Unhandled exception, missing config, fatal startup error |
| 126 | none | Command not executable | Binary exists but has wrong permissions |
| 127 | none | Command not found | Binary missing from image; wrong path in command |
| 137 | SIGKILL | Forceful kill (128+9) | OOMKilled by the kernel, or direct SIGKILL |
| 139 | SIGSEGV | Segmentation fault | Memory access violation in application code |
| 143 | SIGTERM | Graceful termination (128+15) | Liveness probe failure; kubelet sends SIGTERM |
The formula behind signal-based codes: 128 + signal_number. SIGKILL is signal 9, so 128 + 9 = 137.
Most common causes
Ordered by how often I see them in production clusters. The first four cover over 90% of real cases.
Cause 1: application crash (exit code 1)
The application starts, hits a fatal error, and exits. Logs show a stack trace, panic, or unhandled exception.
Typical log output:
- Python:
Exception: Database connection refused - Go:
panic: runtime error: nil pointer dereference - Node.js:
Error: Cannot find module './config' - Java:
java.lang.NullPointerException
Fix. Read the stack trace from kubectl logs --previous. Reproduce locally with the same image and environment variables:
docker run --rm \
-e DATABASE_URL=postgres://db-primary.internal:5432/payments \
-e LOG_LEVEL=debug \
registry.internal/payment-svc:3.1.4
Fix the bug. Build a new image with a distinct tag (not latest). Roll it out:
kubectl set image deployment/payment-svc \
payment-svc=registry.internal/payment-svc:3.1.5 \
-n payments
kubectl rollout status deployment/payment-svc -n payments
Verify: kubectl get pods -n payments shows 1/1 Running with 0 restarts.
Cause 2: OOM kill (exit code 137)
The container exceeded its memory limit and the Linux kernel's OOM killer terminated it. This is one of the most common causes and has a distinct signature in kubectl describe pod:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Why it happens: resources.limits.memory is set lower than the application's actual memory footprint. Or the application has a memory leak.
Quick fix: raise the memory limit.
kubectl patch deployment payment-svc -n payments -p \
'{"spec":{"template":{"spec":{"containers":[{"name":"payment-svc","resources":{"limits":{"memory":"512Mi"},"requests":{"memory":"256Mi"}}}]}}}}'
Before raising blindly, check whether memory is growing without bound (leak) or stabilizing at a higher level than the current limit (undersized). kubectl top pod gives a snapshot:
kubectl top pod payment-svc-7b9f6d-xk2p -n payments
If you have Prometheus with kube-state-metrics:
container_memory_working_set_bytes{container="payment-svc", namespace="payments"}
A monotonically increasing line means a leak. Fix the application. A line that plateaus above the current limit means the limit is too low. Raise it.
For a deeper understanding of how requests and limits interact, see Kubernetes resource requests and limits.
Verify: restart count stops climbing, no OOMKilled in kubectl describe pod.
Cause 3: missing configuration (ConfigMap, Secret, environment variable)
The application requires a ConfigMap, Secret, or environment variable that does not exist in the namespace, has the wrong name, or was deleted.
How to confirm: logs show missing required env variable DATABASE_URL or config file not found: /etc/app/config.yaml. Sometimes the pod never starts and shows CreateContainerConfigError instead of CrashLoopBackOff.
# Check if the referenced ConfigMap exists
kubectl get configmaps -n payments
# Check if the referenced Secret exists
kubectl get secrets -n payments
# Inspect what the deployment expects
kubectl get deployment payment-svc -n payments -o yaml | grep -A 20 "env:"
Common scenarios:
- ConfigMap deleted but the Deployment still references it
- Secret exists in
defaultnamespace but the pod runs inpayments - Typo in
valueFrom.secretKeyRef.key(the key inside the Secret, not the Secret name)
Fix. Create the missing resource or correct the reference:
kubectl create secret generic payment-db-creds -n payments \
--from-literal=DB_PASSWORD=changeme-in-production
Verify: pod starts without config-related errors in kubectl logs.
Cause 4: liveness probe misconfiguration
The container is healthy, but Kubernetes keeps killing it because the liveness probe fails before the application finishes starting. This is the most deceptive cause because there is nothing wrong with the application itself.
Signature: kubectl describe pod shows Liveness probe failed in events. Exit code is 137 (SIGKILL from kubelet) or 143 (SIGTERM). The container's Last State shows very short run times (1–5 seconds).
The most common pattern: initialDelaySeconds is shorter than the application's boot time. A Java service that needs 45 seconds to load Spring context, connect to the database, and warm caches will fail a probe that starts checking at initialDelaySeconds: 5.
Other probe misconfigurations:
- Wrong
path: probe hits/healthzbut the application serves/health - Wrong
port: probe checks 8080 but the application binds 8081 timeoutSeconds: 1while the endpoint takes 2 seconds under loadfailureThreshold: 1: a single timeout triggers a restart
Quick test to confirm. Temporarily remove the liveness probe:
kubectl patch deployment payment-svc -n payments --type json \
-p '[{"op": "remove", "path": "/spec/template/spec/containers/0/livenessProbe"}]'
If restarts stop, the probe was the problem.
Proper fix: add a startup probe. Instead of fighting initialDelaySeconds, use a startup probe that gates the liveness probe until the application is ready:
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30 # 30 x 10s = 300s max startup window
periodSeconds: 10
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
For a full walkthrough of probe types, timing parameters, and endpoint design, see how to configure Kubernetes health probes.
Verify: pod stays Running through startup. No Liveness probe failed events in kubectl describe pod.
Cause 5: wrong entrypoint or missing binary (exit code 127/128)
The pod spec defines a command or args that does not exist in the image. The container exits immediately with exit code 127 (not found) or 126 (not executable). Logs are usually empty.
Important distinction: in Kubernetes, command overrides Docker's ENTRYPOINT and args overrides Docker's CMD. If you set command in the pod spec, the image's ENTRYPOINT is completely ignored.
Fix. Inspect the image to find the correct binary path:
docker inspect registry.internal/payment-svc:3.1.4 \
| jq '.[0].Config.Entrypoint, .[0].Config.Cmd'
Or run a throwaway pod to explore the filesystem:
kubectl run debug-payment --rm -it \
--image=registry.internal/payment-svc:3.1.4 \
--restart=Never -- /bin/sh
# Inside the container:
find / -name payment-svc 2>/dev/null
Correct the command or args in the Deployment spec.
Verify: container starts and produces log output.
Cause 6: container completes successfully (exit code 0)
The container exits with code 0 (success), but restartPolicy: Always (the default for Deployments) keeps restarting it. This happens when a batch job image is deployed as a Deployment instead of a Job.
Fix options:
- Switch the workload to a
JoborCronJobif it is a batch task - Fix the container command to run a persistent process (
exec payment-svc serveinstead ofpayment-svc run-once) - Set
restartPolicy: Neverif the workload genuinely should not restart (only valid in bare pods, not Deployments)
Debugging when logs are empty
When kubectl logs --previous returns nothing, the container crashed before writing to stdout. Two techniques get you inside.
Ephemeral containers (Kubernetes 1.23+)
Ephemeral containers attach a debug container to a running (or crashing) pod. They share the process namespace with the target container.
kubectl debug -it payment-svc-7b9f6d-xk2p -n payments \
--image=busybox:1.36 \
--target=payment-svc
Once inside, you can inspect the target container's filesystem and processes:
# Check if the binary exists
ls -la /proc/1/root/usr/local/bin/
# Check shared library dependencies
ldd /proc/1/root/usr/local/bin/payment-svc
# Inspect environment variables
cat /proc/1/environ | tr '\0' '\n'
Copy-and-override debugging
Create a copy of the crashing pod with a different command that keeps it alive:
kubectl debug payment-svc-7b9f6d-xk2p -it \
--copy-to=payment-debug \
-n payments \
-- sleep infinity
Then exec in and manually run the original command to see the error interactively:
kubectl exec -it payment-debug -n payments -- /bin/sh
# Inside: run the original entrypoint
/usr/local/bin/payment-svc serve --config /etc/payment/config.yaml
Clean up the debug pod when done: kubectl delete pod payment-debug -n payments.
Monitoring and alerting
If you run kube-state-metrics with Prometheus, you can alert on CrashLoopBackOff before a human notices.
Detect CrashLoopBackOff directly:
kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"} == 1
Alert on high restart rate (catches loops earlier):
# Prometheus alerting rule
- alert: PodCrashLooping
expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
for: 10m
labels:
severity: warning
annotations:
summary: "Pod / restarted >5 times in 1h"
The restart-rate query fires before the backoff reaches the 5-minute cap, giving you earlier warning.
How to prevent recurrence
- Pin image tags.
payment-svc:3.1.4, notpayment-svc:latest. Pinned tags make rollbacks deterministic withkubectl rollout undo. - Always set memory requests and limits. No limit means no memory guard; the kernel kills your pod without warning when the node runs low.
- Test containers locally first.
docker runwith the same environment variables catches most startup failures before they reach the cluster. - Use startup probes for slow applications. Do not stretch
initialDelaySecondsto 120 seconds. That is what startup probes are for. - Verify dependencies exist before deploying.
kubectl get configmaps,secrets -n paymentsbeforekubectl apply -f deployment.yaml. - Set JVM heap and GOMAXPROCS explicitly. Java and Go processes default to using all available node memory/cores, not the container's limits. JVM respects
-XX:MaxRAMPercentagesince Java 10. Go respects theGOMEMLIMITenvironment variable since Go 1.19.
When to escalate
If none of the causes above match, or if fixes do not resolve the loop, collect this information before asking for help:
- Full output of
kubectl describe pod <pod-name> -n <namespace> - Logs from the previous container:
kubectl logs <pod-name> -n <namespace> --previous - Namespace events:
kubectl get events -n <namespace> --sort-by=.metadata.creationTimestamp - Node resource pressure:
kubectl top nodeandkubectl describe node <node-name> - Kubernetes version:
kubectl version - The Deployment manifest (sanitized of secrets)
- Whether the issue is consistent or intermittent
- Whether the same image works in a different namespace or cluster