CrashLoopBackOff: why your Kubernetes pod keeps restarting

CrashLoopBackOff is not an error. It is a status that tells you a container inside your pod is starting, crashing, and being restarted in a loop with increasing delays. This article walks through what the status means, how to read exit codes and logs, the most common root causes, and how to fix each one.

What CrashLoopBackOff actually means

CrashLoopBackOff is a kubelet display status, not an official pod phase. The actual pod phase stays Running or transitions to Failed. What the status tells you: a container started, exited, and the kubelet is restarting it with exponential backoff delays.

The backoff sequence (Kubernetes 1.32 and earlier):

Restart attempt Wait before next attempt
1 10 seconds
2 20 seconds
3 40 seconds
4 80 seconds
5 160 seconds
6+ 300 seconds (capped)

If the container runs successfully for 10 consecutive minutes, the backoff resets to 10 seconds.

During each wait window, kubectl get pods shows CrashLoopBackOff. When the kubelet actually attempts a restart, the status briefly flips to Running or ContainerCreating before the container crashes again.

Kubernetes 1.33 alpha change. The ReduceDefaultCrashLoopBackOffDecay feature gate (KEP-4603, opt-in, disabled by default) drops the initial delay to 1 second and caps at 60 seconds. If you are on 1.33 with this gate enabled, your pods recover faster from transient failures but the diagnostic approach stays the same.

Reading the symptoms

Three commands, in this order.

Step 1: confirm the crash loop

kubectl get pods -n <namespace>

Expected output for a crashing pod:

NAME                      READY   STATUS             RESTARTS      AGE
payment-svc-7b9f6d-xk2p  0/1     CrashLoopBackOff   14 (2m ago)   47m

High RESTARTS, 0/1 under READY, and CrashLoopBackOff under STATUS confirm the loop.

Step 2: read exit codes and events

kubectl describe pod payment-svc-7b9f6d-xk2p -n payments

Look for two sections in the output.

Container state:

Last State:    Terminated
  Reason:      Error
  Exit Code:   1
  Started:     Wed, 09 Apr 2026 14:03:10 +0000
  Finished:    Wed, 09 Apr 2026 14:03:11 +0000

The exit code is your first branch point. See the exit code reference table below.

Events:

Warning  BackOff  2m (x47 over 30m)  kubelet  Back-off restarting failed container

If you see Liveness probe failed in the events instead, the container is not crashing on its own. Kubernetes is killing it. That is a different root cause.

Step 3: read the previous container's logs

kubectl logs payment-svc-7b9f6d-xk2p -n payments --previous

The --previous flag (or -p) retrieves logs from the last terminated container instance. Without it, you get the current container's logs, which are often empty because the container just started or has not produced output yet.

For multi-container pods:

kubectl logs payment-svc-7b9f6d-xk2p -n payments -c payment-worker --previous

If --previous returns empty logs, the container crashed before writing to stdout/stderr. That points to a missing binary (exit code 127), a shared library failure, or an OOM kill in the first milliseconds. Skip to debugging with ephemeral containers.

Exit code reference

Exit codes appear in kubectl describe pod under Last State > Exit Code. The code tells you what category of failure you are dealing with.

Exit code Signal What it means Most likely Kubernetes cause
0 none Successful exit Container completed a task and stopped; restartPolicy: Always keeps restarting it
1 none Application error Unhandled exception, missing config, fatal startup error
126 none Command not executable Binary exists but has wrong permissions
127 none Command not found Binary missing from image; wrong path in command
137 SIGKILL Forceful kill (128+9) OOMKilled by the kernel, or direct SIGKILL
139 SIGSEGV Segmentation fault Memory access violation in application code
143 SIGTERM Graceful termination (128+15) Liveness probe failure; kubelet sends SIGTERM

The formula behind signal-based codes: 128 + signal_number. SIGKILL is signal 9, so 128 + 9 = 137.

Most common causes

Ordered by how often I see them in production clusters. The first four cover over 90% of real cases.

Cause 1: application crash (exit code 1)

The application starts, hits a fatal error, and exits. Logs show a stack trace, panic, or unhandled exception.

Typical log output:

  • Python: Exception: Database connection refused
  • Go: panic: runtime error: nil pointer dereference
  • Node.js: Error: Cannot find module './config'
  • Java: java.lang.NullPointerException

Fix. Read the stack trace from kubectl logs --previous. Reproduce locally with the same image and environment variables:

docker run --rm \
  -e DATABASE_URL=postgres://db-primary.internal:5432/payments \
  -e LOG_LEVEL=debug \
  registry.internal/payment-svc:3.1.4

Fix the bug. Build a new image with a distinct tag (not latest). Roll it out:

kubectl set image deployment/payment-svc \
  payment-svc=registry.internal/payment-svc:3.1.5 \
  -n payments
kubectl rollout status deployment/payment-svc -n payments

Verify: kubectl get pods -n payments shows 1/1 Running with 0 restarts.

Cause 2: OOM kill (exit code 137)

The container exceeded its memory limit and the Linux kernel's OOM killer terminated it. This is one of the most common causes and has a distinct signature in kubectl describe pod:

Last State:    Terminated
  Reason:      OOMKilled
  Exit Code:   137

Why it happens: resources.limits.memory is set lower than the application's actual memory footprint. Or the application has a memory leak.

Quick fix: raise the memory limit.

kubectl patch deployment payment-svc -n payments -p \
  '{"spec":{"template":{"spec":{"containers":[{"name":"payment-svc","resources":{"limits":{"memory":"512Mi"},"requests":{"memory":"256Mi"}}}]}}}}'

Before raising blindly, check whether memory is growing without bound (leak) or stabilizing at a higher level than the current limit (undersized). kubectl top pod gives a snapshot:

kubectl top pod payment-svc-7b9f6d-xk2p -n payments

If you have Prometheus with kube-state-metrics:

container_memory_working_set_bytes{container="payment-svc", namespace="payments"}

A monotonically increasing line means a leak. Fix the application. A line that plateaus above the current limit means the limit is too low. Raise it.

For a deeper understanding of how requests and limits interact, see Kubernetes resource requests and limits.

Verify: restart count stops climbing, no OOMKilled in kubectl describe pod.

Cause 3: missing configuration (ConfigMap, Secret, environment variable)

The application requires a ConfigMap, Secret, or environment variable that does not exist in the namespace, has the wrong name, or was deleted.

How to confirm: logs show missing required env variable DATABASE_URL or config file not found: /etc/app/config.yaml. Sometimes the pod never starts and shows CreateContainerConfigError instead of CrashLoopBackOff.

# Check if the referenced ConfigMap exists
kubectl get configmaps -n payments

# Check if the referenced Secret exists
kubectl get secrets -n payments

# Inspect what the deployment expects
kubectl get deployment payment-svc -n payments -o yaml | grep -A 20 "env:"

Common scenarios:

  • ConfigMap deleted but the Deployment still references it
  • Secret exists in default namespace but the pod runs in payments
  • Typo in valueFrom.secretKeyRef.key (the key inside the Secret, not the Secret name)

Fix. Create the missing resource or correct the reference:

kubectl create secret generic payment-db-creds -n payments \
  --from-literal=DB_PASSWORD=changeme-in-production

Verify: pod starts without config-related errors in kubectl logs.

Cause 4: liveness probe misconfiguration

The container is healthy, but Kubernetes keeps killing it because the liveness probe fails before the application finishes starting. This is the most deceptive cause because there is nothing wrong with the application itself.

Signature: kubectl describe pod shows Liveness probe failed in events. Exit code is 137 (SIGKILL from kubelet) or 143 (SIGTERM). The container's Last State shows very short run times (1–5 seconds).

The most common pattern: initialDelaySeconds is shorter than the application's boot time. A Java service that needs 45 seconds to load Spring context, connect to the database, and warm caches will fail a probe that starts checking at initialDelaySeconds: 5.

Other probe misconfigurations:

  • Wrong path: probe hits /healthz but the application serves /health
  • Wrong port: probe checks 8080 but the application binds 8081
  • timeoutSeconds: 1 while the endpoint takes 2 seconds under load
  • failureThreshold: 1: a single timeout triggers a restart

Quick test to confirm. Temporarily remove the liveness probe:

kubectl patch deployment payment-svc -n payments --type json \
  -p '[{"op": "remove", "path": "/spec/template/spec/containers/0/livenessProbe"}]'

If restarts stop, the probe was the problem.

Proper fix: add a startup probe. Instead of fighting initialDelaySeconds, use a startup probe that gates the liveness probe until the application is ready:

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30      # 30 x 10s = 300s max startup window
  periodSeconds: 10
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

For a full walkthrough of probe types, timing parameters, and endpoint design, see how to configure Kubernetes health probes.

Verify: pod stays Running through startup. No Liveness probe failed events in kubectl describe pod.

Cause 5: wrong entrypoint or missing binary (exit code 127/128)

The pod spec defines a command or args that does not exist in the image. The container exits immediately with exit code 127 (not found) or 126 (not executable). Logs are usually empty.

Important distinction: in Kubernetes, command overrides Docker's ENTRYPOINT and args overrides Docker's CMD. If you set command in the pod spec, the image's ENTRYPOINT is completely ignored.

Fix. Inspect the image to find the correct binary path:

docker inspect registry.internal/payment-svc:3.1.4 \
  | jq '.[0].Config.Entrypoint, .[0].Config.Cmd'

Or run a throwaway pod to explore the filesystem:

kubectl run debug-payment --rm -it \
  --image=registry.internal/payment-svc:3.1.4 \
  --restart=Never -- /bin/sh

# Inside the container:
find / -name payment-svc 2>/dev/null

Correct the command or args in the Deployment spec.

Verify: container starts and produces log output.

Cause 6: container completes successfully (exit code 0)

The container exits with code 0 (success), but restartPolicy: Always (the default for Deployments) keeps restarting it. This happens when a batch job image is deployed as a Deployment instead of a Job.

Fix options:

  • Switch the workload to a Job or CronJob if it is a batch task
  • Fix the container command to run a persistent process (exec payment-svc serve instead of payment-svc run-once)
  • Set restartPolicy: Never if the workload genuinely should not restart (only valid in bare pods, not Deployments)

Debugging when logs are empty

When kubectl logs --previous returns nothing, the container crashed before writing to stdout. Two techniques get you inside.

Ephemeral containers (Kubernetes 1.23+)

Ephemeral containers attach a debug container to a running (or crashing) pod. They share the process namespace with the target container.

kubectl debug -it payment-svc-7b9f6d-xk2p -n payments \
  --image=busybox:1.36 \
  --target=payment-svc

Once inside, you can inspect the target container's filesystem and processes:

# Check if the binary exists
ls -la /proc/1/root/usr/local/bin/

# Check shared library dependencies
ldd /proc/1/root/usr/local/bin/payment-svc

# Inspect environment variables
cat /proc/1/environ | tr '\0' '\n'

Copy-and-override debugging

Create a copy of the crashing pod with a different command that keeps it alive:

kubectl debug payment-svc-7b9f6d-xk2p -it \
  --copy-to=payment-debug \
  -n payments \
  -- sleep infinity

Then exec in and manually run the original command to see the error interactively:

kubectl exec -it payment-debug -n payments -- /bin/sh
# Inside: run the original entrypoint
/usr/local/bin/payment-svc serve --config /etc/payment/config.yaml

Clean up the debug pod when done: kubectl delete pod payment-debug -n payments.

Monitoring and alerting

If you run kube-state-metrics with Prometheus, you can alert on CrashLoopBackOff before a human notices.

Detect CrashLoopBackOff directly:

kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"} == 1

Alert on high restart rate (catches loops earlier):

# Prometheus alerting rule
- alert: PodCrashLooping
  expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Pod / restarted >5 times in 1h"

The restart-rate query fires before the backoff reaches the 5-minute cap, giving you earlier warning.

How to prevent recurrence

  • Pin image tags. payment-svc:3.1.4, not payment-svc:latest. Pinned tags make rollbacks deterministic with kubectl rollout undo.
  • Always set memory requests and limits. No limit means no memory guard; the kernel kills your pod without warning when the node runs low.
  • Test containers locally first. docker run with the same environment variables catches most startup failures before they reach the cluster.
  • Use startup probes for slow applications. Do not stretch initialDelaySeconds to 120 seconds. That is what startup probes are for.
  • Verify dependencies exist before deploying. kubectl get configmaps,secrets -n payments before kubectl apply -f deployment.yaml.
  • Set JVM heap and GOMAXPROCS explicitly. Java and Go processes default to using all available node memory/cores, not the container's limits. JVM respects -XX:MaxRAMPercentage since Java 10. Go respects the GOMEMLIMIT environment variable since Go 1.19.

When to escalate

If none of the causes above match, or if fixes do not resolve the loop, collect this information before asking for help:

  • Full output of kubectl describe pod <pod-name> -n <namespace>
  • Logs from the previous container: kubectl logs <pod-name> -n <namespace> --previous
  • Namespace events: kubectl get events -n <namespace> --sort-by=.metadata.creationTimestamp
  • Node resource pressure: kubectl top node and kubectl describe node <node-name>
  • Kubernetes version: kubectl version
  • The Deployment manifest (sanitized of secrets)
  • Whether the issue is consistent or intermittent
  • Whether the same image works in a different namespace or cluster

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy

Search this site

Start typing to search, or browse the knowledge base and blog.