Symptom: kubectl delete namespace hangs forever
You ran kubectl delete namespace team-payments and the command returned immediately with namespace "team-payments" deleted. The namespace is gone from the API, except it isn't:
$ kubectl get namespace team-payments
NAME STATUS AGE
team-payments Terminating 42h
It has been like this for hours, sometimes days. kubectl get pods -n team-payments returns nothing or a small set of stuck pods. kubectl delete namespace team-payments --force --grace-period=0 returns successfully but changes nothing. The namespace stays Terminating across control-plane restarts and survives every retry.
This is the classic stuck-namespace state, and it has exactly one mechanical cause: at least one finalizer is still on the namespace or on a resource inside it, and no controller is removing it.
Why namespaces get stuck: the finalizer mechanism
A finalizer is a namespaced key in metadata.finalizers that tells Kubernetes to wait until specific conditions are met before fully deleting an object. When you delete a namespace, two things happen at the same time. First, the namespace gets a deletionTimestamp and enters the Terminating phase. Second, the namespace controller starts walking every API resource in the namespace, deleting anything it finds. The namespace cannot disappear from etcd until both that cleanup completes and every finalizer key is gone.
Two distinct finalizer locations matter for namespaces, and confusing them is the single biggest source of botched fixes:
metadata.finalizerslives on individual resources inside the namespace (PVCs, custom resources, sometimes pods). Each key names a controller that has cleanup work to do. The owning controller is responsible for finishing the work and removing the key.spec.finalizerslives on the namespace itself, and on a healthy cluster always contains exactly one entry:kubernetes. This is the namespace's own finalizer, drained by the namespace controller via the dedicated/finalizesubresource only after every resource inside the namespace has been removed.
The namespace stays Terminating until both lists are empty. If a controller crashes, gets uninstalled, or has its API endpoint go offline, its finalizer key is never removed, and the namespace stops making progress.
You can read this state directly. The namespace status carries four well-defined conditions that tell you exactly which phase is stuck:
| Condition | Meaning |
|---|---|
NamespaceDeletionDiscoveryFailure |
The namespace controller could not list one or more API resources to begin cleanup |
NamespaceDeletionContentFailure |
An API call during cleanup failed (often an unavailable APIService) |
NamespaceContentRemaining |
Resources still exist inside the namespace |
NamespaceFinalizersRemaining |
At least one finalizer is still set on a resource in the namespace |
The diagnostic flow below uses these conditions as its starting point.
Common causes, ordered by likelihood
- An orphaned custom resource finalizer. An operator or CRD-based controller was uninstalled while one of its custom resources still existed. The CR has a
metadata.finalizersentry likecert-manager.io/finalizerortackle.tackle.io/finalizer, and the controller that would remove it is gone. - An unavailable APIService. A metrics or extension APIService (
v1beta1.metrics.k8s.io,v1beta1.custom.metrics.k8s.io, an admission webhook backend) is reportingAvailable: False. The namespace controller cannot enumerate resources for cleanup, so it logsNamespaceDeletionDiscoveryFailureand stops. - Built-in PV or PVC protection finalizers. A pod still references a PVC, or a PVC still has a bound PV, so the
kubernetes.io/pvc-protectionorkubernetes.io/pv-protectionfinalizer never clears. The namespace waits for those resources to release. - A stuck pod with its own finalizer. A pod has a finalizer set by an admission controller or sidecar (Istio, Linkerd, KEDA scaledobjects) that is no longer running.
- A failing admission webhook. A
ValidatingWebhookConfigurationorMutatingWebhookConfigurationwithfailurePolicy: Failis calling a backend that is gone, so the namespace controller cannot delete resources to satisfy the webhook.
The diagnosis below tests for each in order.
Diagnose which resource is blocking the namespace
Start with the namespace's own conditions. They will name the failure mode for you:
kubectl get namespace team-payments -o jsonpath='{.status.conditions}' | jq
A typical stuck-namespace output looks like this:
[
{
"lastTransitionTime": "2026-04-22T14:02:11Z",
"message": "All content successfully deleted, may be waiting on finalization",
"reason": "ContentDeletionTermination",
"status": "False",
"type": "NamespaceDeletionContentFailure"
},
{
"lastTransitionTime": "2026-04-22T14:02:11Z",
"message": "Some content in the namespace has finalizers remaining: cert-manager.io/finalizer in 1 resource instances",
"reason": "SomeFinalizersRemain",
"status": "True",
"type": "NamespaceFinalizersRemaining"
}
]
Read every message carefully. The NamespaceFinalizersRemaining message names the exact finalizer key (cert-manager.io/finalizer) and how many resources still carry it. That is the single most useful piece of information in this entire workflow. If you see only NamespaceContentRemaining without NamespaceFinalizersRemaining, the resources have not been touched yet and you most likely have a discovery or content failure.
If NamespaceDeletionDiscoveryFailure is True, check APIServices next:
kubectl get apiservice | grep -v True
Any APIService with Available=False will block discovery. The output will look like:
NAME SERVICE AVAILABLE AGE
v1beta1.custom.metrics.k8s.io monitoring/custom-metrics-apiserver False (ServiceNotFound) 63d
v1beta1.metrics.k8s.io kube-system/metrics-server True 63d
A ServiceNotFound, MissingEndpoints, or FailedDiscoveryCheck reason means the backing service is gone or unhealthy. The namespace controller calls every registered APIService to discover what to clean up; one broken APIService is enough to stall the entire deletion.
If conditions point to a finalizer (NamespaceFinalizersRemaining), find the exact resource:
# List every namespaced API resource type
kubectl api-resources --verbs=list --namespaced -o name |
xargs -n 1 kubectl get --show-kind --ignore-not-found -n team-payments
This is the GKE-recommended sweep. It calls every namespaced API kind and prints what is still present. Anything that survives a normal namespace deletion is by definition holding a finalizer.
To inspect the finalizer keys on a specific resource:
kubectl get <kind>/<name> -n team-payments -o jsonpath='{.metadata.finalizers}'
To list every namespaced resource still holding a finalizer in one pass:
kubectl api-resources --verbs=list --namespaced -o name |
xargs -I {} kubectl get {} -n team-payments \
-o jsonpath='{range .items[?(@.metadata.finalizers)]}{.kind}/{.metadata.name}: {.metadata.finalizers}{"\n"}{end}'
The output names every kind, name, and the exact finalizer keys still attached. From this list you can decide which fix below applies.
Fix A: remove a finalizer from a stuck namespaced resource
This is the right path when the diagnosis named a specific resource (a custom resource, a PVC, a pod) holding a finalizer that no controller is removing.
Before patching anything, decide whether the controller might still come back. If you accidentally uninstalled a CRD's controller and you can reinstall it, do that first. The controller will pick up the resource, run its cleanup, and remove the finalizer normally. Removing the finalizer manually skips that cleanup, which can leave external resources (DNS records, cloud volumes, certificates, IAM bindings) orphaned in the systems the controller was managing.
If the controller is permanently gone (the operator was decommissioned, the CRD has been removed, the upstream service is shut down), patch the finalizer off the specific resource. The Kubernetes garbage collection model treats finalizers as cleanup signals; removing the key tells the API server "stop waiting for cleanup, just delete the object record":
kubectl patch certificate api-tls -n team-payments \
--type=json \
-p='[{"op": "remove", "path": "/metadata/finalizers"}]'
You will know it worked when: the resource disappears from kubectl get <kind> -n team-payments within a second or two, and kubectl get namespace team-payments either flips to NotFound (deletion completes) or moves to a different blocking finalizer if there are more.
For PVC-protection finalizers, the right fix is usually not patching them off. The kubernetes.io/pvc-protection finalizer is removed automatically once no pod references the PVC. Find the referencing pod:
kubectl get pods -n team-payments -o json |
jq -r '.items[] | select(.spec.volumes[]?.persistentVolumeClaim) |
"\(.metadata.name): \([.spec.volumes[].persistentVolumeClaim.claimName] | join(","))"'
Delete that pod cleanly (or remove its own finalizer if the pod is itself stuck), and the PVC-protection finalizer clears on its own.
Fix B: delete or repair an unavailable APIService
If NamespaceDeletionDiscoveryFailure is True and kubectl get apiservice | grep -v True shows something Available=False, the namespace controller cannot enumerate resources to clean up. Until the APIService is either healthy or removed, no namespace anywhere in the cluster will finish terminating cleanly.
Verify which one is broken:
kubectl describe apiservice v1beta1.custom.metrics.k8s.io
The Status section names the condition. ServiceNotFound and MissingEndpoints mean the backing Service or its pods are gone. FailedDiscoveryCheck means the backend is up but not serving the OpenAPI endpoint correctly.
Two paths:
1. Reinstall the missing controller. If the APIService belongs to a metrics-server or custom-metrics adapter that was meant to be running, redeploy it. Once kubectl get apiservice shows Available=True, namespace deletion resumes automatically; you do not need to retry the delete.
2. Delete the orphaned APIService. If the controller is gone for good and the APIService is just leftover registration:
kubectl delete apiservice v1beta1.custom.metrics.k8s.io
This removes the registration so the namespace controller stops trying to call a service that does not exist. After deletion, the namespace's NamespaceDeletionDiscoveryFailure condition flips to False within seconds and the controller continues with content cleanup.
The same pattern applies to broken admission webhooks. A MutatingWebhookConfiguration with failurePolicy: Fail and a dead backend will block resource deletion the same way. Either restore the backend, switch the webhook to failurePolicy: Ignore for the affected operations, or delete the webhook configuration entirely if the controller behind it is gone.
Fix C (last resort): drain the namespace's own finalizer via /finalize
When every namespaced resource has been removed but the namespace itself stays Terminating, the namespace's own spec.finalizers: [kubernetes] entry is the last blocker. The namespace controller normally drains this entry through the dedicated /finalize subresource once content cleanup completes, but if the controller has hit a permanent error (a content failure it cannot retry past), the entry stays.
This is the situation where you bypass the controller and write to /finalize directly. It is a last resort because it tells the API server "I take responsibility that nothing in this namespace needs cleanup". If anything was still relying on the controller to release external resources, that work is now skipped permanently.
The cleanest one-liner using jq and kubectl replace --raw:
NS=team-payments
kubectl get namespace "$NS" -o json |
jq 'del(.spec.finalizers)' |
kubectl replace --raw "/api/v1/namespaces/$NS/finalize" -f -
kubectl replace --raw sends the modified JSON straight to the /finalize subresource. The API server validates that you are only changing spec.finalizers (the subresource permits nothing else) and accepts the empty list. The namespace disappears within a second.
You will know it worked when: kubectl get namespace team-payments returns Error from server (NotFound). If it returns the namespace again, the call did not hit /finalize (often a permission issue: this requires update on namespaces/finalize).
Some older guides recommend kubectl proxy plus a curl PUT against http://127.0.0.1:8001/api/v1/namespaces/$NS/finalize with the modified JSON in a temp file. That works too and is functionally identical, but kubectl replace --raw removes the proxy step and is the direction the project has moved towards.
Why kubectl delete --force --grace-period=0 does not work here
A common reflex is to retry the delete with --force --grace-period=0. It looks like it should override anything. It does not.
Per the kubectl reference, --force --grace-period=0 does two things: it bypasses graceful pod termination, and it removes the resource from the API immediately for resources where that is supported. It does not remove finalizers. For pods, it skips the SIGTERM grace period and asks the kubelet to drop the pod from etcd; for namespaces, it has no special meaning beyond "do not wait for graceful termination", which the namespace controller never honors anyway.
If a namespace is stuck on a finalizer, every --force retry returns the same namespace deleted message and changes nothing. The deletion request was already accepted at the original delete; the API just keeps recording it. The only mechanism that actually clears the state is finalizer removal, either by the responsible controller or via Fix A, B, or C above.
This is also why retry loops, scripted re-deletes, and kubectl replace of a yaml manifest with metadata.deletionTimestamp cleared all do nothing. None of them are the operation that the API server treats as "remove the finalizer".
What this is NOT
A few adjacent failure modes look similar but are different problems:
- A pod stuck Terminating is not the same as a namespace stuck Terminating. A pod can hang on its own finalizer or a kubelet that is unreachable. The namespace might be fine; only one resource is stuck. Diagnose pods first if individual workloads are the symptom.
- Slow deletion is not stuck deletion. A namespace with hundreds of pods can take minutes to fully terminate, especially if pods have long
terminationGracePeriodSeconds. If the conditions show progress (the resource count drops, finalizers clear one by one), wait. Stuck means no progress over many minutes, not slow. - A namespace with
Activestatus that you cannot delete is a different problem (RBAC, admission, finalizer added pre-deletion). It is not the Terminating-state problem this article addresses. - CRD-level termination issues sometimes look like namespace issues but are CRD-wide. A CRD that is itself stuck Terminating because instances exist across multiple namespaces is its own diagnosis.
Warning: what you might be leaving behind
Removing a finalizer is metadata-only. The finalizer was a marker that some controller had work to do; removing it tells the API server "do that work was already done or skip it". Skipping is usually the wrong word: the work simply does not happen.
What that practically means depends on the finalizer:
cert-manager.io/finalizeron a Certificate: Cleanup of issued certificates from the issuer (Let's Encrypt account, ACME challenge records, internal CA) is skipped. The certificate object is gone; the upstream record is not.- A CSI driver finalizer on a PV: The volume in the cloud provider (EBS, GCE PD, Azure Disk) is not detached or deleted. You will see orphaned volumes accruing cost.
- An external-dns finalizer: DNS records the controller created in Route 53 or Cloudflare are not removed.
- An operator finalizer on a Kafka or Redis CR: Cluster-level cleanup (drop topics, deregister from external systems, release IAM roles) is skipped.
Before using Fix A or Fix C, check whether the controller is actually gone or just temporarily unhealthy. If it might come back, the cheaper path is to restore the controller and let it complete its work. If it is gone for good, run a manual sweep of the external system afterward (cloud console, DNS provider, IAM) to clean up what the finalizer would have handled.
When to escalate
When you are about to ask for help (internal SRE channel, support, the controller's GitHub issues), collect:
- The full output of
kubectl get namespace <name> -o yaml, includingstatus.conditions. - The output of
kubectl get apiservice(the full table, not justFalserows). - The full list of remaining resources from
kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found -n <name>. - The exact finalizer keys you have identified (
cert-manager.io/finalizer,kubernetes.io/pvc-protection, etc.) and which resources hold them. - The Kubernetes version (
kubectl version --short) and provider (EKS, GKE, AKS, on-prem with version of the CNI in use). - A note on whether the responsible controller is supposed to be running and, if not, when and why it was removed.
Without those, anyone helping you will ask for them first. Collecting them up front saves a round trip.
Prevent recurrence
- Never uninstall an operator while its CRs still exist. Delete the CRs first, wait for cleanup to finish, then uninstall the operator. If the operator was already uninstalled, do not assume the leftover CRs will silently disappear; they will block any namespace they live in.
- Use
helm uninstallwith--waitfor charts that own CRDs and CRs, or follow the chart's documented uninstall order (CRs, then operator, then CRDs). - Audit unavailable APIServices regularly. A scheduled job that alerts on
kubectl get apiservice | grep -v Truecatches orphaned registrations before they bite. - Set
failurePolicy: Ignoreon optional admission webhooks that are not security-critical. AFailpolicy on a backend that disappears blocks more than just the namespace it is webhooked on; it can stall every cluster-wide cleanup that touches the same resource type. - Document which finalizers your platform relies on. When something stalls, knowing which finalizers exist legitimately versus which are leftovers lets you triage in seconds instead of grepping the wider Kubernetes ecosystem.
For broader Kubernetes troubleshooting workflows, see Pod stuck in Pending: why Kubernetes cannot schedule your workload for resource-side blockers, OOMKilled: Kubernetes out of memory errors explained for memory-driven pod failures, and Kubernetes multi-tenancy: namespace isolation, ResourceQuota, and LimitRange for how the namespace itself is provisioned and what finalizers tend to live inside it.