The Kubernetes Node Readiness Controller: never schedule pods on half-ready nodes

The Node Readiness Controller (NRR) stops Kubernetes from scheduling pods on nodes that are not truly ready by declaratively managing taints based on custom node conditions.

Kubernetes marks a node as "Ready" as soon as the kubelet is running and networking works at a minimal level. But in practice, every platform engineer knows that this is far from enough. Your CNI plugin still needs to initialize, GPU drivers need to load, security agents need to run, and yet the scheduler happily places pods on nodes that are not actually ready. The Node Readiness Controller (NRR), officially launched as a kubernetes-sigs project in early February 2026, solves this problem with a declarative, CRD-driven system for managing node taints based on custom health signals. The project implements the ideas behind KEP 5416 (NodeReadinessGates) as an out-of-band solution you can deploy on any existing cluster today.

The problem every platform engineer recognizes

The default Ready condition of a Kubernetes node is binary: either the kubelet works, or it does not. But modern clusters have complex bootstrap requirements. A node needs a working CNI plugin, maybe NVIDIA GPU drivers, storage drivers, observability agents, or security scanners. The time between "kubelet is ready" and "all components are operational" is a dangerous window in which the scheduler places pods that fail immediately.

The common workaround has been the same for years: register nodes with a NoSchedule taint via the kubelet flag --register-with-taints, configure critical DaemonSets to tolerate that taint, and build a custom controller that removes the taint when everything is ready. It works, but it has serious downsides. You need to build and maintain custom controllers, those controllers need broad nodes/patch RBAC permissions, race conditions appear between taint removal and scheduler decisions, and every organization reinvents the wheel. Antonio Ojea, a well-known Kubernetes contributor, summarized it well in the launch discussion: "Node readiness, especially around network readiness has been always a cause of friction for cluster admins and platform operators."

How the NodeReadinessRule CRD works

NRR introduces one custom resource: the NodeReadinessRule. With it, you declaratively define which conditions on a node must be True before a specific taint is removed. The controller itself does not run health checks; it reacts to node conditions reported by other components, such as Node Problem Detector or the included Readiness Condition Reporter.

Here is the CNI readiness example from the official repository:

apiVersion: readiness.node.x-k8s.io/v1alpha1
kind: NodeReadinessRule
metadata:
  name: network-readiness-rule
spec:
  conditions:
    - type: "example.com/CNIReady"
      requiredStatus: "True"
  taint:
    key: "readiness.k8s.io/NetworkReady"
    effect: "NoSchedule"
    value: "pending"
  enforcementMode: "bootstrap-only"
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker: ""

The logic is elegant. All nodes with the label node-role.kubernetes.io/worker get the taint readiness.k8s.io/NetworkReady=pending:NoSchedule. As soon as the node condition example.com/CNIReady is True, the controller removes the taint and the node becomes schedulable. Multiple conditions inside one rule work as a logical AND: all must be True.

The full CRD spec has these fields:

Field	Type	Description
`spec.conditions[].type`	string	The node condition type to monitor
`spec.conditions[].requiredStatus`	string	The required status (for example `"True"`)
`spec.taint.key`	string	The taint key to manage
`spec.taint.effect`	string	`NoSchedule`, `NoExecute`, or `PreferNoSchedule`
`spec.taint.value`	string	Optional taint value
`spec.enforcementMode`	string	`bootstrap-only` or `continuous`
`spec.nodeSelector`	LabelSelector	Select specific node subsets
`spec.dryRun`	boolean	Simulate without applying taints

One subtle but important design detail: the controller does not change the node Ready status. It only manages taints. The node remains "Ready" from the kubelet perspective, but the taint prevents the scheduler from placing pods there until all custom conditions are fulfilled.

Bootstrap-only versus continuous: two enforcement modes

The enforcementMode fundamentally determines controller behavior, and that choice directly affects your operating model.

Bootstrap-only is designed for one-time initialization steps. Think of loading GPU drivers during node startup, pre-pulling large container images, or waiting until a CNI plugin is operational for the first time. Once all conditions in the rule are True, the controller marks bootstrap as complete and stops monitoring that specific rule on that node. If the CNI plugin later crashes, the taint is not applied again. This is the right behavior for provisioning workflows where you only want to gate initial startup.

Continuous monitoring goes one step further. The controller watches conditions over the node's entire lifetime. If a critical dependency fails, for example a device driver crash or a security agent stopping, the taint is applied again immediately. This acts as a "fuse break" mechanism: existing pods keep running (unless you use NoExecute as taint effect), but no new pods are placed. Once the dependency recovers, the taint is removed again. This is essential for scenarios where you need an ongoing health guarantee.

Multiple rules, nodeSelector, and dry-run

The power of NRR is in composing multiple rules. You can create independent NodeReadinessRule objects, each with its own taint. An example from the community discussion illustrates this pattern:

Rule A: requires CNIReady -> manages taint network-not-ready:NoSchedule
Rule B: requires GPUReady -> manages taint gpu-not-ready:NoSchedule

Regular workloads tolerate neither taint and therefore wait only for network readiness. You configure GPU workloads with a toleration for the network-not-ready taint but not for gpu-not-ready, so they wait until both conditions are met. A validation webhook prevents two rules from trying to manage the same taint, which eliminates race conditions.

With nodeSelector, you target specific node subsets. GPU readiness only needs to apply to nodes with a GPU label, while CNI readiness applies to all worker nodes. This makes it possible to manage heterogeneous clusters with different readiness requirements per node type, without partitioning the cluster.

The dry-run mode (dryRun: true) is an indispensable feature for production environments. When enabled, the controller logs which actions it would take and updates rule status to show affected nodes, but applies no actual taints. This gives you the ability to audit new readiness requirements across your fleet before enforcing them, which is crucial when a misconfigured rule could remove hundreds of nodes from the scheduling pool.

The Readiness Condition Reporter and integration with Node Problem Detector

The controller is intentionally decoupled from health checking. It reacts to node conditions, but leaves reporting those conditions to other components. This gives you a flexible architecture with three integration paths.

The first option is the included Readiness Condition Reporter, a lightweight agent you deploy as a sidecar or DaemonSet. It periodically polls local HTTP health endpoints and patches the corresponding node conditions through the Kubernetes API. In the CNI readiness example from the repository, this reporter is injected as a sidecar into the Calico deployment via the script examples/cni-readiness/apply-calico.sh. The project includes a separate Dockerfile.reporter for this agent.

The second option is integration with Node Problem Detector (NPD). NPD is an existing DaemonSet that detects node-level issues and reports them as node conditions. Through NPD's Custom Plugin Monitor, you can write custom scripts that return exit code 0 (healthy) or 1 (problem). NPD translates this into node conditions that the NRR controller then watches. This is ideal if you already run NPD, you only need to add a custom plugin.

The third option is your own daemon or operator that patches node conditions directly through the Kubernetes API. Any component that can write custom conditions to node.status.conditions works out of the box with the NRR controller.

How NRR compares to NFD, Pod Scheduling Gates, and custom controllers

The Kubernetes ecosystem has multiple tools related to node scheduling, but each solves a fundamentally different problem.

Node Feature Discovery (NFD) detects hardware and software features on nodes and turns them into labels. NFD tells you what a node has: an NVIDIA GPU, specific CPU instruction sets, certain kernel versions. The NRR controller tells you whether what the node has actually works. NFD labels a node with feature.node.kubernetes.io/pci-10de.present: "true" (NVIDIA GPU present); NRR ensures the GPU driver is actually loaded and functional before GPU workloads are placed. They are complementary: NFD for capability discovery, NRR for readiness enforcement.

Pod Scheduling Readiness Gates (GA since Kubernetes v1.30) work at pod level. With schedulingGates in the pod spec, you can hold individual pods until an external controller removes the gate. This is useful for scenarios such as dynamic quota management or pre-deployment checks. NRR works at node level: it is not "is this pod ready to be scheduled?" but "is this node ready to receive pods?" Both mechanisms complement each other.

Custom taint-management controllers, the current workaround, do essentially the same thing as NRR, but ad hoc and without standardization. NRR replaces them with a declarative, standardized system that needs fewer RBAC privileges and no custom code. It removes the operational burden of building, testing, and maintaining your own controllers.

Real-world scenarios: from CNI to GPU to autoscaling

CNI readiness is the most obvious scenario. When a new node joins a cluster with Calico or Cilium, it takes time before the CNI plugin is fully operational. During that window, the node can be "Ready" while pod networking does not work. With a NodeReadinessRule in bootstrap-only mode, you keep the node tainted until the CNI agent reports healthy. The official repository example demonstrates exactly this scenario with Calico.

GPU nodes have the "False Ready" problem: Kubernetes marks the node as schedulable, but GPU resources are not yet advertised by the device plugin. The scheduler places GPU pods that immediately fail with UnexpectedAdmissionError. An NRR rule that waits for a GPUReady condition, combined with a nodeSelector for GPU-labeled nodes, prevents this entirely.

Autoscaling scenarios are explicitly supported. The test suite demonstrates how the controller automatically applies taints to new nodes added by an autoscaler. As soon as required components on the new node are running and reporting healthy, the taint is removed. This makes NRR particularly valuable for cluster autoscaler integrations where nodes come and go dynamically.

Multi-tenant platforms benefit from combining multiple rules and nodeSelectors. Different teams can have different readiness requirements: one team needs a working service mesh, another team needs specific storage drivers. With separate rules and taints per node pool, you can provide this granular enforcement without complex custom logic.

KEP 5416 and why this is an out-of-band solution

The Node Readiness Controller implements the concepts from KEP 5416 (NodeReadinessGates) without requiring changes to core Kubernetes. The original KEP proposal, tracked as kubernetes/enhancements#5233, proposes adding a readinessGates field to NodeSpec, similar to how PodSpec.ReadinessGates works. In that upstream vision, the kubelet would set readiness gates in node spec during self-registration, and the scheduler would only place pods once all gate conditions are True.

The issue with the upstream approach is that it requires changes to kubelet, scheduler, and the Node API. That is a long KEP graduation path. NRR avoids this by using existing Kubernetes primitives: node conditions for health signals and taints for scheduling control. The result is functionally equivalent, but deployable on any existing cluster today. If KEP 5416 eventually lands in core Kubernetes, it offers a cleaner taint-free model, but until then NRR provides the same capabilities as an out-of-tree solution.

Community and what is next

The project is maintained by Ajay Sundar Karuppasamy (Google) under SIG Node. The community is active in the #sig-node-readiness-controller Slack channel on kubernetes.slack.com. The roadmap includes metrics and alerting integration, improved logging, performance optimizations, and scale testing for more than 1000 nodes.

A key moment will be KubeCon + CloudNativeCon Europe 2026 in Amsterdam (March 23-26), where a maintainer track session is planned: "Addressing Non-Deterministic Scheduling: Introducing the Node Readiness Controller". For Dutch DevOps engineers, this is a great chance to see the project in action and speak directly with maintainers.

Conclusion

The Node Readiness Controller closes a gap that was patched for years with ad hoc scripts and custom controllers. The declarative NodeReadinessRule CRD approach replaces fragile, self-built taint management with a standardized system that composes with existing tools such as NPD and NFD. The two enforcement modes, bootstrap-only for provisioning and continuous for runtime health, cover the two most common scenarios. The project is alpha, but the timing is good: early adoption gives you influence on API direction and prepares your team for a future where node readiness is a first-class Kubernetes concept. If you want to start, begin with the CNI readiness example on a Kind cluster and build from there.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy