ContainerCreating stuck: debugging pods that never start

ContainerCreating means the kubelet is setting up your pod's prerequisites (volumes, network, secrets) but something is blocking it. Unlike CrashLoopBackOff, the container never actually starts. The fix depends on which prerequisite is stuck: a PVC that will not bind, a missing Secret, a broken CNI plugin, or an init container that never finishes. This article walks through each cause, how to identify it from kubectl events, and how to resolve it.

What ContainerCreating actually means

ContainerCreating is a kubelet-reported waiting state, not an official Kubernetes pod phase. The actual pod phase is Pending. What the status tells you: the pod has been scheduled to a node and the kubelet is working on prerequisite tasks before the container process can run.

Those tasks include:

  • Pulling the container image (if not already cached on the node)
  • Creating the pod's network namespace via the CNI plugin
  • Attaching and mounting PersistentVolumes
  • Injecting ConfigMaps, Secrets, and projected volumes into the container filesystem
  • Running init containers to completion

If any of these stall, the pod stays in ContainerCreating indefinitely. There is no built-in timeout that moves it to Failed. The kubelet keeps retrying, so you need to intervene.

How it differs from related states. ImagePullBackOff means the image pull specifically failed and the kubelet is backing off. CreateContainerConfigError means a ConfigMap or Secret reference is invalid and Kubernetes caught it at configuration time. Init:N/M means init containers are still running. If you see ContainerCreating, the image was either already pulled or has not been attempted yet, and the problem lies elsewhere.

Reading kubectl describe pod events

The Events section of kubectl describe pod is the single most useful diagnostic tool. Events expire from the API server after one hour by default, so check them promptly.

kubectl describe pod <pod-name> -n <namespace>

Scroll to the Events section at the bottom. A healthy pod shows this progression:

Normal   Scheduled      Successfully assigned default/my-pod to node-3
Normal   Pulling        Pulling image "registry.internal/my-app:2.4.1"
Normal   Pulled         Successfully pulled image
Normal   Created        Created container my-app
Normal   Started        Started container my-app

A pod stuck in ContainerCreating halts before Created and typically shows Warning events. The first Warning event points to the root cause.

Warning reason Likely root cause Section in this article
FailedMount Volume cannot be mounted Volume mount failures
FailedAttachVolume Cloud disk cannot attach Volume mount failures
FailedCreatePodSandBox CNI plugin error CNI plugin issues
Failed with "secret not found" Missing Secret Missing ConfigMaps or Secrets
Failed with "configmap not found" Missing ConfigMap Missing ConfigMaps or Secrets

If there are no events at all, the kubelet itself may be having trouble. Check kubelet logs on the node where the pod is scheduled (the Node: field in the describe output shows which node):

# SSH to the node, then:
journalctl -u kubelet --since "15 minutes ago" | grep -i "sandbox\|cni\|volume\|mount"

For pod-specific events sorted by time:

kubectl get events -n <namespace> \
  --field-selector involvedObject.name=<pod-name> \
  --sort-by='.lastTimestamp'

Volume mount failures

Volume problems are the most common cause of ContainerCreating getting stuck. Three distinct failure modes exist.

PVC not bound

A pod referencing a PersistentVolumeClaim that has STATUS: Pending cannot start. The kubelet waits for the volume to become available.

kubectl get pvc -n <namespace>
# Look for STATUS = Pending

kubectl describe pvc <pvc-name> -n <namespace>
# Events section shows why binding failed

Why a PVC stays Pending:

  • No matching PersistentVolume exists. The cluster has no PV matching the PVC's StorageClass, access mode, or capacity request. Check what exists: kubectl get pv.
  • StorageClass not found. The storageClassName in the PVC references a non-existent class. Verify: kubectl get storageclass.
  • Access mode mismatch. Block storage (AWS EBS, GCE Persistent Disk, Azure Disk) typically only supports ReadWriteOnce. A PVC requesting ReadWriteMany will not bind to these.
  • WaitForFirstConsumer binding mode. StorageClasses with volumeBindingMode: WaitForFirstConsumer delay provisioning until a pod is actually scheduled. The PVC shows Pending until the scheduler picks a node. This is expected behavior, not a bug. If the pod itself is not schedulable (separate issue), the PVC stays Pending.

For a deeper understanding of PV/PVC lifecycle phases and binding mechanics, see Kubernetes PersistentVolumes and PersistentVolumeClaims.

Fixes:

  1. Create a matching PV manually, or verify that the StorageClass provisioner pod is running in kube-system.
  2. Correct the storageClassName to an existing class.
  3. Switch to an access mode compatible with the underlying storage type.
  4. For WaitForFirstConsumer, debug the scheduling problem first (see Pod stuck in Pending).

You will know it worked when: kubectl get pvc -n <namespace> shows STATUS: Bound and the pod transitions out of ContainerCreating.

FailedAttachVolume and Multi-Attach errors

The kubelet emits FailedAttachVolume when a cloud disk cannot be attached to the node. The most common cause: a ReadWriteOnce volume is still recorded as attached to a different node.

Typical event:

Warning  FailedAttachVolume  Multi-Attach error for volume "pvc-abc123":
  Volume is already exclusively attached to one node and can't be attached to another

This happens after a pod is rescheduled to a new node (scaling event, node failure, rolling update) but the previous attachment was not cleaned up.

Diagnosis:

# Check VolumeAttachment objects
kubectl get volumeattachment

# Find stale attachments for the volume
kubectl get volumeattachment -o json | \
  jq '.items[] | select(.spec.source.persistentVolumeName=="pvc-abc123") | {name: .metadata.name, node: .spec.nodeName, deletionTimestamp: .metadata.deletionTimestamp}'

Fixes:

  1. If the old pod is still terminating, wait for it to fully stop: kubectl get pods -n <namespace> -o wide.
  2. If a stale VolumeAttachment object exists for a node that is gone, delete it: kubectl delete volumeattachment <name>.
  3. For Deployments with ReadWriteOnce volumes, change strategy.type to Recreate instead of RollingUpdate. Rolling updates try to bring up a new pod before the old one is fully terminated, which triggers Multi-Attach on RWO volumes.
  4. If you need concurrent access, switch to RWX-capable storage (NFS, AWS EFS, Azure Files).

You will know it worked when: the FailedAttachVolume events stop and the pod transitions to Running.

CSI driver not running

If the CSI controller or node plugin pods are down, all volume operations fail.

kubectl get pods -n kube-system | grep csi
kubectl get csidrivers

If the CSI node plugin is missing on the target node, restart the DaemonSet:

kubectl rollout restart daemonset <csi-node-plugin> -n kube-system

Large volumes with fsGroup

When spec.securityContext.fsGroup is set, the kubelet recursively changes file ownership on every file in the volume during mount. For volumes with millions of files, this can take minutes and cause a mount timeout.

Kubernetes 1.20 introduced fsGroupChangePolicy (GA in 1.23). Set it to OnRootMismatch to skip recursive ownership changes when the root directory already has the correct group:

spec:
  securityContext:
    fsGroup: 2000
    fsGroupChangePolicy: "OnRootMismatch"

Missing ConfigMaps or Secrets

When a pod references a ConfigMap or Secret that does not exist in the same namespace, the behavior depends on how the reference is made.

Missing Secret (volume mount): The pod stays in ContainerCreating. Events show:

Warning  FailedMount  MountVolume.SetUp failed for volume "secrets":
  secret "db-credentials" not found

Missing ConfigMap (env or envFrom): The pod typically shows CreateContainerConfigError in kubectl get pods, not ContainerCreating. Events show:

Warning  Failed  Error: configmap "app-config" not found

The difference: Kubernetes validates ConfigMap references at container configuration time (earlier in the process), while Secret volume mounts are resolved by the kubelet at mount time (slightly later).

Diagnosis:

# What does the pod reference?
kubectl describe pod <pod-name> -n <namespace>
# Check the Volumes, Env, and EnvFrom sections

# Does the resource exist in the same namespace?
kubectl get configmap -n <namespace>
kubectl get secret -n <namespace>

Key rules: a pod can only reference ConfigMaps and Secrets in the same namespace. A reference to a specific key that does not exist inside the ConfigMap also blocks startup, unless the reference is marked optional: true.

Fixes:

  1. Create the missing resource:
kubectl create configmap app-config \
  --from-file=config.yaml -n <namespace>

kubectl create secret generic db-credentials \
  --from-literal=password=changeme-in-production -n <namespace>
  1. Mark the reference as optional if the config is not strictly required:
volumes:
- name: config
  configMap:
    name: app-config
    optional: true
  1. After creating the resource, trigger a pod restart. Kubernetes does not automatically retry:
kubectl rollout restart deployment <deployment-name> -n <namespace>

You will know it worked when: kubectl describe pod no longer shows FailedMount or Failed events referencing the missing resource, and the pod transitions to Running.

CNI plugin issues

The CNI (Container Network Interface) plugin assigns an IP address and configures network routing when the pod sandbox is created. If it fails, the sandbox cannot be established and the pod stays in ContainerCreating.

Identifying CNI failures. The describe output shows:

Warning  FailedCreatePodSandBox  Failed to create pod sandbox: rpc error:
  code = Unknown desc = failed to setup network for sandbox "abc123": ...

CNI-specific error messages vary by plugin:

  • Calico: plugin type="calico" failed (add): error getting ClusterInformation: Unauthorized means the calico-node DaemonSet is not running or has RBAC issues.
  • AWS VPC CNI: failed to assign an IP address to container means the subnet has no free IP addresses. Check IPAMD logs: kubectl logs -n kube-system -l k8s-app=aws-node -c aws-node.
  • Generic: network plugin is not ready: cni config uninitialized means no CNI configuration exists in /etc/cni/net.d/ on the node. The CNI DaemonSet was never deployed or is not scheduled on that node.

Diagnosis:

# Check CNI DaemonSet pods
kubectl get pods -n kube-system -l k8s-app=calico-node     # Calico
kubectl get pods -n kube-system -l app=aws-node             # AWS VPC CNI
kubectl get pods -n kube-system -l k8s-app=cilium           # Cilium

# Check CNI pod logs
kubectl logs -n kube-system <cni-pod-name>

# Check node status (NotReady often indicates networking problems)
kubectl describe node <node-name> | grep -A5 Conditions

Fixes by scenario:

CNI DaemonSet pod crashing or not ready:

kubectl rollout restart daemonset calico-node -n kube-system

IP address pool exhausted (AWS VPC CNI):

  • Add more nodes to distribute IP demand.
  • Attach additional subnets to the node group.
  • Enable prefix delegation to assign /28 prefixes per ENI instead of individual IPs, increasing density from roughly 30 to 110 pods per node on m5.large instances.

Node has no CNI configuration: A newly joined node may take 30–60 seconds for the CNI DaemonSet pod to start and write its config. If the node stays in this state longer, check that the DaemonSet's nodeSelector and tolerations allow it to run on that node.

You will know it worked when: kubectl describe pod no longer shows FailedCreatePodSandBox events and the pod gets an IP address (visible in kubectl get pods -o wide).

Init containers blocking

Init containers run sequentially before main containers start. Each must exit with code 0 before the next begins. A pod where init containers are still running shows Init:N/M in kubectl get pods, not ContainerCreating.

This distinction matters. If you see Init:0/2, the problem is the init container, not the main container setup. If you see ContainerCreating, all init containers completed but something else (volume, secret, CNI) is blocking.

kubectl STATUS Meaning
Init:N/M N of M init containers completed; waiting for more
Init:Error An init container exited non-zero
Init:CrashLoopBackOff An init container is failing repeatedly with backoff
PodInitializing All init containers done; main containers starting
ContainerCreating Main containers being set up (init containers already succeeded)

Diagnosing stuck init containers:

# See init container states and exit codes
kubectl describe pod <pod-name> -n <namespace>
# Look at the "Init Containers:" section

# Get init container logs
kubectl logs <pod-name> -c <init-container-name> -n <namespace>

# If the container already terminated:
kubectl logs <pod-name> -c <init-container-name> -n <namespace> --previous

Common causes:

  1. Waiting for a dependency that never becomes ready. The init container runs a wait-for-it loop checking a database or external service. If the dependency is down or DNS is broken, the loop runs forever. Debug DNS: kubectl run -it --rm dnstest --image=busybox:1.36 --restart=Never -- nslookup <service-name>.<namespace>.svc.cluster.local.
  2. Init container image pull failure. Shows as ImagePullBackOff on the init container specifically. Check the Init Containers section in describe output and see ImagePullBackOff troubleshooting.
  3. Script exits with non-zero code. Check logs: kubectl logs <pod> -c <init-container-name>. Add set -x to shell-based init scripts for verbose tracing.
  4. Resource limits too tight. CPU throttling or OOM causes the init container to be killed or run too slowly. Increase resources.limits.

You will know it worked when: kubectl get pods shows the status progressing from Init:N/M through PodInitializing to Running.

Diagnostic decision tree

Pod stuck in ContainerCreating
|
+-- kubectl describe pod --> Events section
|   |
|   +-- FailedAttachVolume / FailedMount
|   |   +-- PVC Pending? --> StorageClass, provisioner, access mode
|   |   +-- Multi-Attach error? --> Stale VolumeAttachment, Recreate strategy
|   |   +-- CSI error? --> CSI driver health
|   |
|   +-- Failed: secret/configmap not found --> Create resource, restart pod
|   |
|   +-- FailedCreatePodSandBox --> CNI plugin health, IP exhaustion
|   |
|   +-- No events --> Check kubelet logs (journalctl -u kubelet)
|
+-- kubectl get pods shows Init:N/M --> Not ContainerCreating
    +-- kubectl logs <pod> -c <init-container-name>

When to escalate

If the cause does not match any of the above, or if fixes do not resolve the issue, collect this information before asking for help:

  • Full output of kubectl describe pod <pod-name> -n <namespace>
  • Pod events: kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name> --sort-by='.lastTimestamp'
  • PVC status (if volumes are involved): kubectl describe pvc <pvc-name> -n <namespace>
  • VolumeAttachment objects: kubectl get volumeattachment -o yaml
  • CNI pod logs (if sandbox creation failed): kubectl logs -n kube-system <cni-pod-name>
  • Kubelet logs from the target node: journalctl -u kubelet --since "30 minutes ago"
  • Kubernetes version: kubectl version
  • The pod or Deployment manifest (sanitized of secrets)

How to prevent recurrence

  • Validate resources before deploying. kubectl get configmaps,secrets -n <namespace> before kubectl apply. CI pipelines can automate this check.
  • Use Recreate strategy for Deployments with RWO volumes. RollingUpdate with a single-replica Deployment using a ReadWriteOnce volume will always trigger Multi-Attach errors.
  • Monitor PVC binding state. With kube-state-metrics, kube_persistentvolumeclaim_status_phase{phase="Pending"} catches unbound PVCs before pods reference them.
  • Set fsGroupChangePolicy: OnRootMismatch on workloads that mount large volumes with fsGroup. This avoids recursive ownership changes on every pod restart.
  • Keep CNI DaemonSets healthy. Alert on kube_daemonset_status_number_unavailable{daemonset=~".*cni.*|calico-node|aws-node|cilium"} to catch CNI pod failures before they block new pods.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy

Search this site

Start typing to search, or browse the knowledge base and blog.