Karpenter on EKS: faster node autoscaling with NodePool and EC2NodeClass

Karpenter provisions nodes in 45–60 seconds on EKS by calling EC2 Fleet directly instead of waiting for Auto Scaling Groups. Where Cluster Autoscaler picks from predefined node groups, Karpenter evaluates all available instance types per pending pod batch and launches the tightest fit. This guide covers installing Karpenter v1.x on EKS, writing NodePool and EC2NodeClass manifests, configuring disruption and consolidation, migrating from Cluster Autoscaler with zero downtime, and monitoring everything through Prometheus.

Table of contents

What you will have at the end

A running Karpenter installation on EKS that provisions nodes based on actual pod requirements, consolidates underutilized capacity automatically, handles Spot interruptions, and exposes metrics to your Prometheus monitoring stack.

Prerequisites

  • An EKS cluster running Kubernetes 1.28+
  • kubectl, helm, and aws CLI installed locally
  • IAM permissions to create roles, policies, and EC2 tags
  • Subnets and security groups tagged for Karpenter discovery (covered in the install steps)
  • Prometheus and Grafana installed if you want the monitoring section to work immediately

Why Karpenter instead of Cluster Autoscaler

Cluster Autoscaler (CA) scans for pending pods on a timer, then asks an Auto Scaling Group to add a node from a predefined set of instance types. That round-trip takes 3–5 minutes on a good day.

Karpenter skips the ASG entirely. It watches for unschedulable pods event-by-event, computes which instance type fits the batch best from up to 60 candidates, and calls the EC2 Fleet API directly. The result: nodes ready in 45–60 seconds.

Dimension Cluster Autoscaler Karpenter
Trigger Periodic scan (10+ s) Event per pending pod
Node selection Predefined node groups All instance types matching NodePool
Provisioning time 3–5 min (ASG delay) 45–60 s (EC2 Fleet direct)
Consolidation Removes idle nodes only Empty, multi-node, and single-node consolidation
Instance flexibility Limited to node group types Any instance satisfying requirements

Teams switching from CA to Karpenter commonly report 20–40% cost reduction from better bin-packing and automated consolidation alone.

One clarification that trips people up: Karpenter replaces Cluster Autoscaler, not HPA or VPA. HPA scales pods, VPA right-sizes resource requests, Karpenter provisions the nodes those pods land on. They are complementary layers, not alternatives.

Install Karpenter on EKS

Step 1: set environment variables

export KARPENTER_NAMESPACE="kube-system"
export KARPENTER_VERSION="1.11.1"       # latest stable as of April 2026
export K8S_VERSION="1.31"               # match your EKS cluster version
export CLUSTER_NAME="production-main"
export AWS_DEFAULT_REGION="eu-west-1"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"

Step 2: create IAM roles

Karpenter needs two IAM roles:

  1. KarpenterNodeRole for the EC2 instances it launches. Attach AmazonEKSWorkerNodePolicy, AmazonEKS_CNI_Policy, AmazonEC2ContainerRegistryReadOnly, and AmazonSSMManagedInstanceCore.
  2. KarpenterControllerRole for the controller pod itself, using IRSA. This role needs scoped EC2 permissions: ec2:RunInstances, ec2:CreateFleet, ec2:TerminateInstances, ec2:DescribeInstances, ec2:DescribeSubnets, ec2:DescribeSecurityGroups, ec2:DescribePlacementGroups (required since v1.11), ec2:CreateTags, ec2:DeleteTags, iam:PassRole, iam:ListInstanceProfiles, ssm:GetParameter, sqs:ReceiveMessage, sqs:DeleteMessage, among others.

Security note: any principal that can create or delete the tags karpenter.sh/managed-by, karpenter.sh/nodepool, and kubernetes.io/cluster/${CLUSTER_NAME} can indirectly influence what Karpenter provisions. Restrict tag CRUD in your IAM policies.

Step 3: tag subnets and security groups

# Karpenter discovers subnets and security groups by tag
aws ec2 create-tags \
  --resources subnet-0abc1234 subnet-0def5678 sg-0aabb1122 \
  --tags Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}

Karpenter picks the subnet with the most available IPs per availability zone.

Step 4: install with Helm

helm registry logout public.ecr.aws   # clear stale tokens

helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version "${KARPENTER_VERSION}" \
  --namespace "${KARPENTER_NAMESPACE}" \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --wait

Step 5: verify the controller is running

kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter --tail=20

Expected output includes lines like controller started and watching for pending pods. No ERROR lines about IAM or STS.

Create a NodePool and EC2NodeClass

A NodePool defines what kind of nodes Karpenter may provision (instance families, capacity types, architectures, limits). An EC2NodeClass defines how to provision them on AWS (AMI, subnets, security groups, IAM, disk).

# ec2nodeclass.yaml
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  role: "KarpenterNodeRole-production-main"

  amiSelectorTerms:
    - alias: "al2023@v20250301"  # pin in production; never use @latest

  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "production-main"

  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "production-main"

  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 50Gi
        volumeType: gp3
        encrypted: true

  metadataOptions:
    httpEndpoint: enabled
    httpTokens: required          # IMDSv2 only
    httpPutResponseHopLimit: 1
# nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    metadata:
      labels:
        team: platform
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["m", "c", "r"]   # broad families for bin-packing flexibility
      expireAfter: 720h             # 30 days; forces node refresh
      terminationGracePeriod: 48h   # hard deadline on draining
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
    budgets:
      - nodes: "10%"
  limits:
    cpu: "1000"
    memory: "1000Gi"
  weight: 50

Apply both:

kubectl apply -f ec2nodeclass.yaml -f nodepool.yaml

Verify Karpenter sees the NodePool:

kubectl get nodepools

Expected output:

NAME      NODECLASS   WEIGHT   AGE
default   default     50       12s

Spot and on-demand with weighted NodePools

For workloads that tolerate interruption, split into two NodePools with weight-based priority:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot
spec:
  weight: 100                    # higher weight = tried first
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["m", "c", "r", "m6i", "c6i", "r6i"]  # broad for price-capacity-optimized
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: "500"
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: on-demand-fallback
spec:
  weight: 10
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: "500"

Karpenter tries the weight-100 Spot pool first. If Spot capacity is unavailable, it falls back to the weight-10 on-demand pool. For Spot, Karpenter uses the price-capacity-optimized allocation strategy, which balances price and interruption probability rather than blindly picking the cheapest pool. Keep instance families broad: restricting to fewer than 15 instance types blocks single-node Spot consolidation.

For GPU workloads, use a separate NodePool with taints to isolate expensive GPU nodes from general workloads.

Disruption, consolidation, and drift

Karpenter's disruption model has two categories:

Voluntary (rate-limited by disruption budgets):

  • Consolidation runs in three tiers: delete empty nodes first, then try multi-node consolidation (merge workloads from several nodes onto fewer), then single-node consolidation (replace a node with a smaller one). WhenEmptyOrUnderutilized enables all three. WhenEmpty enables only empty-node deletion.
  • Drift detects when a running node no longer matches the desired NodePool or EC2NodeClass spec (changed AMI, updated requirements, modified security groups). Karpenter replaces drifted nodes gracefully.

Forceful (not rate-limited):

  • Expiration drains and terminates nodes when expireAfter elapses (default 720h / 30 days).
  • Interruption handles EC2 lifecycle events: Spot 2-minute warnings, scheduled maintenance, instance stop signals. Karpenter pre-provisions a replacement during the warning window.

Disruption budgets by reason

Since v1.0 you can scope budgets per disruption reason:

disruption:
  budgets:
    - nodes: "20%"
      reasons: ["Drifted"]
    - nodes: "10%"
      reasons: ["Underutilized"]
    - nodes: "0"
      reasons: ["Empty"]
      schedule: "0 9 * * mon-fri"   # freeze empty-node removal during business hours
      duration: 8h

Protecting specific pods

Add karpenter.sh/do-not-disrupt: "true" as a pod annotation to block voluntary disruption (consolidation, drift) on that pod's node. This does not block expiration or Spot interruption. Pair it with PodDisruptionBudgets for broader availability guarantees.

Migrate from Cluster Autoscaler

Karpenter and Cluster Autoscaler can run simultaneously. Zero-downtime migration follows this sequence:

Step 1: prepare workloads

Add PodDisruptionBudgets to every production Deployment. Without PDBs, scaling down old node groups causes immediate eviction of all replicas. Set accurate resource requests so Karpenter can bin-pack effectively.

Step 2: deploy Karpenter alongside CA

Install Karpenter as described above. Use nodeAffinity on the Karpenter controller Deployment to pin it to nodes in your existing managed node group. Karpenter must not run on nodes it manages (circular dependency if it evicts its own controller).

Step 3: create NodePool and EC2NodeClass

Apply the manifests from the previous sections. Karpenter starts watching for unschedulable pods immediately but does not touch existing CA-managed nodes.

Step 4: scale Cluster Autoscaler to zero

kubectl scale deployment cluster-autoscaler -n kube-system --replicas=0

Step 5: gradually reduce node group capacity

Lower the minSize and desiredCapacity of your ASGs incrementally. As workloads naturally churn (deployments, scaling events, pod restarts), pods land on Karpenter-provisioned nodes. Old nodes drain through normal turnover.

aws autoscaling update-auto-scaling-group \
  --auto-scaling-group-name eks-managed-ng-1 \
  --min-size 2 \
  --desired-capacity 2

Maintain at least 2 nodes per AZ in the initial node group until you have confirmed Karpenter handles all workloads.

Step 6: verify and clean up

kubectl get nodes -L karpenter.sh/nodepool

Nodes with a karpenter.sh/nodepool label are Karpenter-managed. Once no workloads remain on unmanaged nodes, delete the old ASGs.

Salesforce migrated over 1,000 EKS clusters to Karpenter using this exact phased approach.

Monitor Karpenter with Prometheus and Grafana

Karpenter exposes Prometheus metrics at karpenter.kube-system.svc.cluster.local:8080/metrics. If you run kube-prometheus-stack, add a ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: karpenter
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: karpenter
  endpoints:
    - port: http-metrics
      path: /metrics

Key metrics to track

Metric What it tells you
karpenter_nodes_created_total How many nodes Karpenter has provisioned
karpenter_nodes_terminated_total How many nodes were removed (consolidation, expiry, interruption)
karpenter_pods_startup_duration_seconds Time from pod creation to running state
karpenter_scheduler_queue_depth Pending pod batches waiting for nodes
karpenter_voluntary_disruption_decisions_total Consolidation and drift decisions
karpenter_nodes_termination_duration_seconds Drain time; high p95 signals stuck PDBs
karpenter_voluntary_disruption_eligible_nodes Nodes eligible for consolidation that are not being acted on

Full reference: Karpenter metrics documentation.

Grafana dashboards

Import these from Grafana Labs:

  • Karpenter Overview (ID 21699) for NodePool, node, and pod counts
  • Karpenter Performance (ID 22173) for cloud provider errors and pod startup latency
  • Karpenter Activity (ID 18862) for scale-up/down event timelines

Alerts worth configuring

  • Sustained high queue depth: karpenter_scheduler_queue_depth > 5 for more than 2 minutes means Karpenter cannot find capacity. Check NodePool limits, instance availability, and IAM permissions.
  • Slow termination: histogram_quantile(0.95, karpenter_nodes_termination_duration_seconds_bucket) > 600 means draining takes longer than 10 minutes. Look for blocking PDBs or do-not-disrupt annotations.
  • Provisioning surge: a sudden spike in rate(karpenter_nodeclaims_created_total[5m]) may indicate a broken HPA loop or a rogue Deployment.
  • Consolidation blocked: karpenter_voluntary_disruption_eligible_nodes stays high while karpenter_voluntary_disruption_decisions_total stays flat. Disruption budgets or PDBs are preventing cleanup.

Production hardening checklist

  • Pin AMI versions. Use al2023@v20250301, not @latest. Test AMI updates in staging before rolling them via drift.
  • Run Karpenter on Fargate or a dedicated managed node group. Never on Karpenter-managed nodes. A circular dependency means Karpenter evicts its own controller.
  • Require IMDSv2. Set httpTokens: required in EC2NodeClass metadataOptions to block SSRF-based credential theft.
  • Set NodePool resource limits. Always define limits.cpu and limits.memory to cap spending per NodePool.
  • Use IRSA for the controller role. Never attach IAM permissions via EC2 instance metadata.
  • Keep instance families broad for Spot. Fewer than 15 instance type options blocks single-node Spot consolidation.
  • Set terminationGracePeriod when using expireAfter. Without it, a pod annotated with do-not-disrupt blocks node drain indefinitely.

Common gotchas

Pods stuck in Pending despite available NodePools. The pod's requirements (resource requests, node selectors, tolerations) do not fit within any NodePool's requirements. Run kubectl describe pod <name> and check the Events section for scheduling failure reasons. Karpenter can only provision nodes that satisfy the intersection of NodePool constraints and pod constraints.

Nodes created then immediately terminated. The EC2 instance launches but fails to join the cluster. Common causes: missing VPC endpoints for STS or SSM in private clusters, incorrect security group rules blocking kubelet communication, or wrong IAM instance profile.

Consolidation not happening. Check karpenter_voluntary_disruption_eligible_nodes. If it is high but decisions are zero, disruption budgets or PDBs are blocking. Also verify consolidationPolicy is set to WhenEmptyOrUnderutilized, not WhenEmpty.

Windows node slowness. Windows nodes take ~6 minutes to join the cluster plus 15–20 minutes to pull the base image. This is an inherent platform limitation, not a Karpenter issue. Do not expect sub-minute provisioning for Windows workloads.

v1.0 migration failures. If you are upgrading from Karpenter v0.x, the Provisioner and AWSNodeTemplate CRDs are removed in v1.0. Run karpenter-convert -f provisioner.yaml > nodepool.yaml before upgrading. v1.1 drops v1beta1 entirely.

When to escalate

Collect this information before asking for help:

  • Karpenter controller logs: kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter --tail=100
  • NodePool and EC2NodeClass specs: kubectl get nodepools -o yaml and kubectl get ec2nodeclasses -o yaml
  • Pending pods and their events: kubectl get pods --field-selector=status.phase=Pending -A
  • NodeClaim status: kubectl get nodeclaims -o wide
  • Karpenter version: helm list -n kube-system | grep karpenter
  • EKS cluster version and platform version
  • IAM role ARNs for both controller and node roles

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy

Search this site

Start typing to search, or browse the knowledge base and blog.