User namespaces graduated stable in Kubernetes 1.36: pod-level isolation without gVisor

Kubernetes 1.36 promotes user namespaces to stable. A process running as root inside a pod now maps to an unprivileged UID on the host, containing the impact of container breakouts without the compatibility and performance costs of gVisor or Kata Containers.

Kubernetes 1.36 releases on April 22, and with it the UserNamespacesSupport feature gate graduates to stable. The feature has been enabled by default since 1.33, so the GA promotion is less about new functionality and more about a commitment: the API is frozen, the behavior is production-grade, and the feature will not be removed. For platform engineers evaluating pod isolation strategies, user namespaces occupy a specific position: stronger isolation than runAsNonRoot alone, with none of the compatibility and performance costs of gVisor or Kata Containers.

TL;DR

  • User namespaces remap container UIDs to unprivileged host UIDs. Root inside the pod (UID 0) maps to a high, unprivileged UID on the node (e.g., 196608).
  • The opt-in is a single field: spec.hostUsers: false. No feature gate manipulation needed since 1.33.
  • Runtime requirements: containerd 2.0+, Linux kernel 6.3+, and a filesystem supporting idmap mounts.
  • NFS volumes do not support idmap mounts and are a hard blocker.
  • User namespaces reduce post-escape blast radius but expand the reachable kernel attack surface. Combine with seccomp for defense in depth.

Table of contents

What user namespaces do at the kernel level

Linux user namespaces (CLONE_NEWUSER, available since kernel 3.8) allow a process to operate under a different UID/GID mapping than the host. The kernel tracks two sets of IDs: namespace-local IDs (what the process sees) and host IDs (what the kernel uses for permission checks on the actual filesystem).

When a pod sets hostUsers: false, the kubelet assigns it a unique, non-overlapping 65536-UID range on the host. A process that sees itself as root (UID 0) inside the container actually runs as a high unprivileged UID on the node – for example 196608. Capabilities like CAP_SYS_ADMIN are scoped to the namespace and carry no authority on the host.

The practical consequence for container breakouts: if an attacker escapes the container runtime (as in CVE-2024-21626, the runc file descriptor leak), they land as an unprivileged user on the host. Root-owned files like kubeconfig or SSH keys are out of reach. The Kubernetes blog describes this class of mitigation as "greatly mitigated" for multiple historical CVEs, including CVE-2019-5736 (runc binary overwrite) and CVE-2022-0492 (cgroup escape).

The other critical kernel feature making this work is idmap mounts (Linux 5.12+). An idmap mount presents a filesystem with different apparent UID/GID ownership without modifying on-disk inodes. containerd 2.0 uses this for the container image rootfs via overlayfs: no files are chowned, no storage is duplicated, no startup latency is added. That is a direct improvement over containerd 1.7, which physically chowned every file in the image at pod creation, causing measurable storage overhead and slower startup for large images.

From alpha to stable: the KEP-127 timeline

User namespaces in Kubernetes are tracked under KEP-127. The feature took four years to reach stable, largely because the container runtime and kernel ecosystem needed to catch up.

Version Stage Key change
1.25 Alpha Stateless pods only; gate UserNamespacesStatelessPodsSupport
1.27 Alpha (redesigned) Renamed to UserNamespacesSupport; switched to idmap mounts
1.28 Alpha Lifted stateless restriction; supports volumes
1.30 Beta Custom UID/GID ranges; runtime must confirm support
1.33 Beta, on by default No feature gate needed
1.36 GA Feature gate locked; API frozen

The 1.36 release also introduces a new alpha feature: UserNamespacesHostNetworkSupport (KEP-5607). This allows pods to combine hostNetwork: true with hostUsers: false, a combination previously rejected by the API server. That matters for CNI plugins and node-level agents that need host network access but would benefit from UID isolation.

How to enable user namespaces

The opt-in is a single field in the pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: isolated-workload
spec:
  hostUsers: false
  containers:
    - name: app
      image: my-app:latest

No feature gate manipulation is needed since 1.33. The kubelet automatically assigns a unique 65536-UID range per pod.

Runtime and kernel requirements

Component Minimum version
containerd 2.0 (also a general 1.36 requirement)
CRI-O 1.25+ (with crun)
runc 1.2+
crun 1.9+ (1.13+ recommended)
Linux kernel 6.3+

The kernel 6.3 minimum is about tmpfs idmap mount support. Kubernetes mounts Secrets, ConfigMaps, and service account tokens as tmpfs; without tmpfs idmap support, those mounts fail for user-namespaced pods. Overlayfs idmap mounts (for the container rootfs) require kernel 5.19+, but 6.3 is the binding constraint.

Combinations the API server rejects

  • hostUsers: false + hostPID: true
  • hostUsers: false + hostIPC: true
  • hostUsers: false + hostNetwork: true (unless the UserNamespacesHostNetworkSupport alpha gate is enabled)
  • hostUsers: false + volumeDevices (raw block volumes)

What breaks when you flip the switch

NFS volumes. NFS does not support idmap mounts in the Linux kernel. Pods using NFS PersistentVolumeClaims cannot set hostUsers: false. This is likely the most common production blocker.

Host-level capabilities stop working. CAP_SYS_ADMIN inside a user-namespaced pod cannot load kernel modules or change system time. Workloads that depend on host-level capabilities (some monitoring agents, privileged DaemonSets) need to stay on hostUsers: true.

Cross-pod shared volumes. Two pods with different UID mappings writing to the same volume see each other's files as owned by unmapped UIDs. If cross-pod file sharing is required, the pods need hostUsers: true.

Pod Security Standards interaction. Under the restricted Pod Security Standard, runAsNonRoot: true is normally required. When hostUsers: false is set, the policy relaxes this check. The reasoning: the non-root guarantee on the host comes from the namespace mapping, regardless of the in-container UID.

What does not break: most standard workloads. The UID remapping is transparent at the kernel level. A process running as root inside the container reads and writes files normally; idmap mounts handle the translation. fsGroup, runAsUser, and supplementalGroups in securityContext continue to refer to in-container UIDs, and the kubelet translates them to host UIDs via the mapping.

User namespaces vs. the alternatives

runAsNonRoot Seccomp User namespaces gVisor Kata Containers
Isolation UID enforcement Syscall filtering UID remapping User-space kernel Hardware VM
Post-escape UID Same UID on host Root if running as root Unprivileged host UID N/A (no host kernel) N/A (guest kernel)
Syscall compat Full Full (minus blocked) Full ~76 unimplemented Full
Perf. overhead None Negligible None Workload-dependent ~150–300ms startup
Volume compat Full Full No NFS, no raw block Limited Limited
Kernel surface Unchanged Reduced Expanded (see below) Minimized Minimized

runAsNonRoot enforces the in-container UID but provides no host isolation. An escaped process at UID 1000 inside the container is also UID 1000 on the host, with access to everything that UID owns.

Seccomp reduces the attack surface by filtering syscalls but does not remap UIDs. A root container with seccomp enabled is still root on the host after an escape. The Kubernetes RuntimeDefault profile blocks approximately 44 syscalls.

gVisor interposes a user-space kernel written in Go. Stronger pre-escape isolation (the host kernel is never directly invoked), but 76 syscalls remain unimplemented and I/O-heavy workloads see measurable overhead.

Kata Containers provides hardware-enforced isolation via lightweight VMs. Strongest available isolation, but at the cost of 150–300ms startup time and 600MB+ memory overhead per pod.

User namespaces sit in the gap between "enforce UID restrictions" and "run a separate kernel." No performance cost, full syscall compatibility, but the host kernel is shared.

When user namespaces are not enough

This is the section that matters for security architecture decisions.

Research by Edera found that user namespaces expand the kernel attack surface reachable from unprivileged containers. Without user namespaces, an unprivileged container can reach 8 of 40 tested kernel subsystems. With user namespaces, that number rises to 27: a 262% increase. The newly accessible subsystems include nftables (18 CVEs at time of research, including CVE-2024-1086 with a 99.4% reliable escape rate) and overlayfs mounting.

The mechanism: user namespaces grant in-namespace CAP_SYS_ADMIN, which unlocks kernel code paths previously gated behind full root. The container cannot exercise those capabilities on the host, but it can trigger kernel code that may contain exploitable vulnerabilities.

The correct framing: user namespaces reduce the blast radius of an escape (the attacker lands unprivileged) while increasing the probability of finding an escapable bug. For clusters running mutually distrusting tenants, user namespaces alone are insufficient. Kata Containers or gVisor provide the necessary kernel isolation boundary.

The practical recommendation: enable user namespaces and apply seccomp profiles (at minimum RuntimeDefault). Seccomp restricts the additional syscalls that become accessible through user namespaces. Together, they reduce both the probability of escape and the impact if one occurs.

Key takeaways

  • User namespaces are stable in 1.36 and have been on by default since 1.33. The opt-in is spec.hostUsers: false in the pod spec.
  • containerd 2.0 and Linux kernel 6.3+ are hard prerequisites. containerd 2.0 is also a general 1.36 requirement regardless of user namespaces.
  • NFS volumes are a production blocker. Pods using NFS PVCs cannot set hostUsers: false until the kernel adds NFS idmap support.
  • User namespaces reduce post-escape impact but expand the kernel attack surface. Always combine with seccomp profiles for defense in depth.
  • For adversarial multi-tenant isolation, gVisor or Kata Containers remain necessary. User namespaces are a low-cost improvement, not a complete isolation boundary.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy

Search this site

Start typing to search, or browse the knowledge base and blog.