ImagePullBackOff: fixing Kubernetes container image pull failures

ImagePullBackOff means the kubelet failed to pull a container image and is retrying with exponential backoff. The root cause is always in the Events section of kubectl describe pod: a typo in the image reference, missing registry credentials, Docker Hub rate limits, or a network problem between the node and the registry. This article walks through each cause, how to diagnose it, and how to fix it.

What ImagePullBackOff actually means

ErrImagePull and ImagePullBackOff are two stages of the same failure. ErrImagePull appears the first time the kubelet fails to pull a container image from a registry. If the pull keeps failing, Kubernetes enters an exponential backoff loop and the status changes to ImagePullBackOff.

The backoff timing:

Retry cycle	Approximate wait before next attempt
1	10 seconds
2	20 seconds
3	40 seconds
4	80 seconds
5+	300 seconds (capped at 5 minutes)

The pod is not killed during backoff. It sits idle, waiting for the next retry. If the root cause is transient (a brief registry outage, a rate-limit window resetting), the pod self-heals. If the root cause is permanent (a typo, a missing secret), the pod stays in ImagePullBackOff until you fix it.

When the kubelet pulls images

The imagePullPolicy on each container spec controls pull behavior:

Always: pulls on every container start. Default when no tag is specified or when the tag is :latest. The kubelet compares digests and skips redundant layer downloads if the cached image matches.
IfNotPresent: pulls only when the image is not cached on the node. Default for tags other than :latest.
Never: never pulls. Fails with ImagePullBackOff if the image is not already on the node.

Kubernetes v1.33 alpha note. The KubeletEnsureSecretPulledImages feature gate (disabled by default) adds credential verification for cached images. Before v1.33, a pod using imagePullPolicy: IfNotPresent could access a cached private image without valid credentials. With this gate enabled, that same pod will get ImagePullBackOff unless it has the correct imagePullSecrets. If you see new pull failures after upgrading to v1.33 with this gate on, check whether affected pods are missing pull secrets they never needed before.

Diagnosing the root cause

The Events section of kubectl describe pod is the single most useful diagnostic surface for image pull failures. The exact error string tells you the cause category.

kubectl get pods -n <namespace>           # find the pod in ErrImagePull or ImagePullBackOff
kubectl describe pod <pod-name> -n <namespace>  # scroll to Events

A typical Events output:

Events:
  Type     Reason   Age              From     Message
  ----     ------   ----             ----     -------
  Normal   Pulling  3m               kubelet  Pulling image "registry.internal/myapp:v2.1"
  Warning  Failed   3m               kubelet  Failed to pull image "registry.internal/myapp:v2.1": not found
  Warning  Failed   3m               kubelet  Error: ErrImagePull
  Warning  BackOff  2m (x4 over 3m)  kubelet  Back-off pulling image "registry.internal/myapp:v2.1"

Error message reference

Error text in Events	What it means
`not found` / `manifest unknown`	The image tag does not exist in the registry
`repository does not exist` / `no such host`	Wrong registry hostname or image path, or a DNS failure
`unauthorized` / `401` / `pull access denied`	Missing or incorrect credentials for a private registry
`403 Forbidden`	Credentials are valid but lack permission for this image
`toomanyrequests` / `429 Too Many Requests`	Registry rate limit hit (typically Docker Hub)
`i/o timeout` / `connection refused`	Network connectivity issue between the node and the registry
`x509: certificate signed by unknown authority`	TLS certificate problem (self-signed or expired cert)

For broader event queries across namespaces:

kubectl get events -n <namespace> --field-selector type=Warning
kubectl get events --all-namespaces --field-selector reason=BackOff

Wrong image name or tag

The most common cause. A typo in the image name, tag, or registry hostname produces a manifest unknown or not found error.

How to verify

Pull the image from a local workstation first:

docker pull registry.internal/myapp:v2.1

If this fails locally, the image reference is wrong.

Common mistakes:

Tag typo: nginx:lates instead of nginx:latest
Missing tag: omitting the tag defaults to :latest, which many private registries do not publish
Wrong version: myapp:v3 when only v2.1 was pushed
Swapped path segments: myapp/myorg:v1 instead of myorg/myapp:v1
Deleted tag: a CI/CD pipeline cleaned up old tags after deployment

To list available tags in a registry:

# Docker Hub
curl -s https://registry.hub.docker.com/v2/repositories/myorg/myapp/tags/ | jq '.results[].name'

# Any OCI registry (with crane, from google/go-containerregistry)
crane ls registry.internal/myorg/myapp

Fixing it

Edit the parent resource (Deployment, StatefulSet, DaemonSet), not the pod directly:

kubectl set image deployment/my-deployment app=registry.internal/myapp:v2.2 -n <namespace>

For immutability, pin images by digest instead of tag:

spec:
  containers:
  - name: app
    image: registry.internal/myapp@sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2

You will know it worked when kubectl get pods -n <namespace> shows the pod in Running status and kubectl describe pod shows a Pulled event with the correct image reference.

Private registry authentication (imagePullSecrets)

When a pod tries to pull from a private registry without credentials, the registry returns 401 Unauthorized or the misleading repository does not exist or may require 'docker login'. The kubelet pulls images without credentials by default; you must configure imagePullSecrets explicitly.

Step 1: create the secret

kubectl create secret docker-registry regcred \
  --docker-server=registry.internal \
  --docker-username=deploy-bot \
  --docker-password=<token> \
  -n my-namespace

For Docker Hub, use https://index.docker.io/v1/ as the server value.

Secrets are namespace-scoped. The secret must exist in the same namespace as the pod.

Step 2: verify the secret

kubectl get secret regcred -n my-namespace \
  --output="jsonpath={.data.\.dockerconfigjson}" | base64 --decode

The decoded JSON should contain your registry hostname as a key inside the auths object.

Step 3: reference the secret in the pod spec

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
  namespace: my-namespace
spec:
  template:
    spec:
      imagePullSecrets:
      - name: regcred
      containers:
      - name: app
        image: registry.internal/myapp:v2.1

Attaching to a ServiceAccount

Instead of adding imagePullSecrets to every pod spec, attach the secret to the default ServiceAccount. Every new pod in the namespace that does not specify a different ServiceAccount inherits the pull secret automatically:

kubectl patch serviceaccount default \
  -p '{"imagePullSecrets": [{"name": "regcred"}]}' \
  -n my-namespace

This only affects new pods. Existing pods need a restart.

Cloud registries with expiring credentials

AWS ECR tokens expire every 12 hours. GCR service account JSON keys are long-lived but a security risk. Azure ACR service principal credentials expire too. Static imagePullSecrets break on all of these.

The production solution: kubelet credential providers (GA since Kubernetes 1.26). The kubelet calls an external plugin binary at pull time to obtain fresh credentials. No CronJobs refreshing secrets, no stale tokens. Cloud providers maintain the plugins:

AWS: ecr-credential-provider (cloud-provider-aws)
GCP: Workload Identity / gcp-auth-webhook
Azure: acr-credential-provider

You will know it worked when kubectl describe pod shows a Pulled event instead of Failed, and kubectl get pods shows Running.

Docker Hub rate limits

Docker Hub applies pull rate limits based on account type. As of April 2025:

Account type	Pull limit
Unauthenticated (anonymous)	100 pulls per 6 hours, per source IP
Authenticated (free Personal)	200 pulls per 6 hours, per account
Pro / Team / Business	Unlimited

The critical detail: unauthenticated limits apply per source IPv4 address. In a managed Kubernetes cluster where all nodes share a NAT gateway, the entire cluster competes for 100 pulls from a single IP. A cluster running autoscaling workloads can exhaust this in minutes.

When the limit is hit, kubectl describe pod Events show:

toomanyrequests: You have reached your pull rate limit.

Diagnosis

Check remaining quota from a node (or any machine sharing the cluster's outbound IP):

TOKEN=$(curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/nginx:pull" | jq -r .token)
curl -I -H "Authorization: Bearer $TOKEN" https://registry-1.docker.io/v2/library/nginx/manifests/latest 2>/dev/null | grep ratelimit

The ratelimit-remaining header shows how many pulls you have left.

Solutions

Authenticate pulls. Even a free Docker Hub account doubles your quota and decouples it from your IP address. Create a personal access token at Docker Hub and add it as an imagePullSecret:

kubectl create secret docker-registry dockerhub-creds \
  --docker-server=https://index.docker.io/v1/ \
  --docker-username=your-dockerhub-user \
  --docker-password=<personal-access-token> \
  -n <namespace>

kubectl patch serviceaccount default \
  -p '{"imagePullSecrets": [{"name": "dockerhub-creds"}]}' \
  -n <namespace>

Use a pull-through cache. Set up a registry mirror (Harbor, Nexus, or a plain registry:2 with proxy cache) that fronts Docker Hub. Configure containerd on all nodes to use the mirror via /etc/containerd/certs.d/docker.io/hosts.toml:

server = "https://registry-1.docker.io"

[host."https://registry-mirror.internal"]
  capabilities = ["pull", "resolve"]

Containerd picks up hosts.toml changes dynamically. No restart required.

Set imagePullPolicy: IfNotPresent for stable tagged images. This avoids re-pulls when a node already has the image cached:

imagePullPolicy: IfNotPresent

You will know it worked when the toomanyrequests error disappears from kubectl describe pod Events and the pod transitions to Running.

Node-level troubleshooting with crictl

When kubectl describe pod does not give you enough detail, go to the node. crictl talks directly to the container runtime (containerd or CRI-O) over the CRI socket, bypassing the Kubernetes API.

Connect to the node

kubectl get pod <pod-name> -n <namespace> -o wide   # find the node name
ssh <node>

For containerd (default since Kubernetes 1.24):

sudo crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock images

Or set the endpoint persistently in /etc/crictl.yaml:

runtime-endpoint: unix:///var/run/containerd/containerd.sock

Key commands

# List cached images
sudo crictl images | grep myapp

# Test a pull directly (fastest way to isolate auth/network issues)
sudo crictl pull registry.internal/myapp:v2.1

# Test with credentials
sudo crictl pull --creds deploy-bot:<token> registry.internal/myapp:v2.1

# Verbose output for debugging
sudo crictl --debug pull registry.internal/myapp:v2.1

# Check disk space (image pulls fail when the node is full)
df -h /var/lib/containerd

# Prune unused images if disk is full
sudo crictl rmi --prune

containerd vs Docker: a common pitfall

Since Kubernetes 1.24, Docker (dockershim) is removed. All modern clusters use containerd or CRI-O. The Docker CLI no longer interacts with the container runtime Kubernetes uses.

The practical consequence: credentials stored in /root/.docker/config.json are not used by containerd for CRI image pulls. If you migrated from a Docker-based cluster and your pulls stopped working, this is likely the cause. Use imagePullSecrets (through the Kubernetes API) or configure containerd's hosts.toml for node-level credentials.

Network and TLS issues

DNS and firewall problems produce no such host, i/o timeout, or connection refused in the Events section.

From the node:

nslookup registry.internal                    # DNS resolution
curl -v https://registry.internal/v2/         # TCP + TLS connectivity

For self-signed registry certificates (x509: certificate signed by unknown authority), distribute the CA certificate to nodes and configure containerd:

# /etc/containerd/certs.d/registry.internal/hosts.toml
[host."https://registry.internal"]
  capabilities = ["pull", "resolve"]
  ca = "/etc/containerd/certs.d/registry.internal/ca.crt"

For clusters behind a corporate proxy, set the proxy environment variables in the containerd service unit or kubelet environment:

HTTPS_PROXY=http://proxy.internal:3128
NO_PROXY=10.0.0.0/8,192.168.0.0/16,.cluster.local

You will know it worked when curl -v https://registry.internal/v2/ returns HTTP 200 (or 401 for auth-required registries) and the pod's pull succeeds on the next retry.

When to escalate

If you have worked through the causes above and the pod is still stuck, collect the following before asking for help:

Full output of kubectl describe pod <pod-name> -n <namespace> (especially Events)
The exact image reference from the pod spec (kubectl get pod <pod> -o yaml | grep image:)
Whether the image can be pulled from the node directly (sudo crictl pull <image>)
Kubernetes version (kubectl version)
Container runtime and version (sudo crictl version)
Node conditions (kubectl describe node <node> | grep -A5 Conditions)
Disk space on the node (df -h /var/lib/containerd)
Whether a corporate proxy, firewall, or admission webhook (OPA Gatekeeper, Kyverno) might be interfering

How to prevent recurrence

Pin images by digest in production workloads to avoid tag-drift surprises.
Attach imagePullSecrets to ServiceAccounts rather than individual pod specs so new deployments inherit credentials automatically.
Use kubelet credential providers instead of static secrets for cloud registries with expiring tokens.
Run a pull-through cache in front of Docker Hub to avoid rate-limit dependency on an external service.
Monitor for ImagePullBackOff events cluster-wide. A query like kubectl get events --all-namespaces --field-selector reason=BackOff in a recurring check catches problems early.
After fixing the root cause, restart the affected pod or Deployment (kubectl rollout restart deployment/<name>) to clear the backoff timer immediately rather than waiting up to 5 minutes for the next automatic retry.

If your pod started successfully but now keeps restarting, that is a different problem. See CrashLoopBackOff: why your Kubernetes pod keeps restarting for diagnosis steps on container crash loops. If the container starts but does not pass health checks, see how to configure Kubernetes health probes.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy