Kubernetes Cluster Autoscaler: automatic node scaling for managed clusters

Cluster Autoscaler watches for pods stuck in Pending because no node has room, then adds a node from a matching node group. When nodes drop below 50% resource utilization for long enough, it removes them. This guide covers configuring Cluster Autoscaler on EKS, GKE, and AKS, tuning scale-down timing, diagnosing common blockers, and knowing when Karpenter is a better fit.

Table of contents

What you will have at the end

A working Cluster Autoscaler (CA) deployment that automatically adds nodes when pods cannot be scheduled and removes underutilized nodes after a configurable cooldown. You will know how to tune the key timing parameters, unblock stuck scale-down operations, and read CA's status output to understand what it is doing and why.

Prerequisites

  • kubectl connected to a Kubernetes 1.28+ cluster on EKS, GKE, or AKS
  • Cluster admin permissions (RBAC) and cloud IAM permissions to manage node groups/ASGs/VMSS
  • Familiarity with resource requests and limits. CA uses requests, not actual utilization, for every scheduling decision
  • PodDisruptionBudgets configured on production workloads (CA respects PDBs during scale-down)

How Cluster Autoscaler decides to scale

Scale-up

CA polls the API server every scan-interval (default 10 seconds). When it finds pods in Pending state with reason: Unschedulable, it simulates adding a node from each available node group and picks one using the configured expander strategy.

The decision is based entirely on resource requests. A pod requesting 2 CPU and 4Gi memory on a node with 4 CPU allocatable will register as 50% utilized, even if actual CPU usage is 3%.

Scale-down

A node becomes a scale-down candidate when three conditions are true simultaneously:

  1. The sum of all pod requests on the node is below scale-down-utilization-threshold (default 0.5, i.e. 50%)
  2. Every pod on the node can be rescheduled onto other existing nodes
  3. The node has been in this state for at least scale-down-unneeded-time (default 10 minutes)

On top of that, scale-down-delay-after-add (default 10 minutes) blocks all scale-down after any scale-up event. This is deliberate thrash prevention. Under default settings, the effective minimum time from "node becomes underutilized" to "node is removed" is 10-20 minutes.

Version matching

CA's minor version must match the Kubernetes minor version. A cluster running Kubernetes 1.32.x needs CA 1.32.x. Patch version does not need to match. Update CA immediately after upgrading the control plane.

Configure on EKS

On EKS you deploy the open-source Cluster Autoscaler yourself. It needs IAM permissions and ASG discovery tags.

Step 1: create an IAM policy

CA needs read access to describe ASGs and EC2 instances, and write access (scoped to tagged ASGs) to set desired capacity and terminate instances.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeAutoScalingInstances",
        "autoscaling:DescribeLaunchConfigurations",
        "autoscaling:DescribeScalingActivities",
        "autoscaling:DescribeTags",
        "ec2:DescribeImages",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeLaunchTemplateVersions",
        "ec2:GetInstanceTypesFromInstanceRequirements",
        "eks:DescribeNodegroup"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:SetDesiredCapacity",
        "autoscaling:TerminateInstanceInAutoScalingGroup"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "autoscaling:ResourceTag/k8s.io/cluster-autoscaler/enabled": "true",
          "autoscaling:ResourceTag/k8s.io/cluster-autoscaler/production-main": "owned"
        }
      }
    }
  ]
}

Replace production-main with your cluster name. The Condition block ensures CA can only modify ASGs tagged for your cluster.

Step 2: associate IAM with the CA service account

Option A: EKS Pod Identity (preferred for new clusters)

eksctl create podidentityassociation \
  --cluster production-main \
  --namespace kube-system \
  --service-account-name cluster-autoscaler \
  --role-arn arn:aws:iam::123456789012:policy/ClusterAutoscalerRole

Option B: IRSA (older clusters)

eksctl create iamserviceaccount \
  --cluster=production-main \
  --namespace=kube-system \
  --name=cluster-autoscaler \
  --attach-policy-arn=arn:aws:iam::123456789012:policy/ClusterAutoscalerPolicy \
  --approve

Step 3: tag your ASGs

Every ASG (or EKS Managed Node Group) that CA should manage needs two tags:

Tag key Value
k8s.io/cluster-autoscaler/enabled true
k8s.io/cluster-autoscaler/production-main owned

eksctl adds these automatically. Terraform and CloudFormation require manual addition.

For scaling from zero (node group min=0), add template tags so CA can infer node properties without a live node:

k8s.io/cluster-autoscaler/node-template/label/kubernetes.io/os = linux
k8s.io/cluster-autoscaler/node-template/label/workload-type = batch

Step 4: deploy Cluster Autoscaler

Pin the image tag to match your cluster's Kubernetes minor version:

# For Kubernetes 1.32.x
helm upgrade --install cluster-autoscaler autoscaler/cluster-autoscaler \
  --namespace kube-system \
  --set autoDiscovery.clusterName=production-main \
  --set awsRegion=eu-west-1 \
  --set image.tag=v1.32.0 \
  --set extraArgs.balance-similar-node-groups=true \
  --set extraArgs.skip-nodes-with-local-storage=false \
  --set podAnnotations."cluster-autoscaler\.kubernetes\.io/safe-to-evict"='"false"' \
  --set priorityClassName=system-cluster-critical

The safe-to-evict: "false" annotation prevents CA from evicting its own pod. The system-cluster-critical priority class ensures CA is not preempted under node pressure.

Step 5: verify

kubectl logs -n kube-system -l app.kubernetes.io/name=cluster-autoscaler --tail=20

Expected output includes lines like Cluster Autoscaler version v1.32.0 and periodic Calculating unneeded nodes. No ERROR lines about IAM or STS.

Mixed instance policies on EKS

All instance types in a MixedInstancePolicy ASG must have the same vCPU count and RAM. CA uses the first instance type listed for scheduling simulation. Larger types waste resources (CA schedules fewer pods than the node can fit). Smaller types cause scheduling failures (CA promises more capacity than the node delivers).

Separate On-Demand and Spot into distinct ASGs. Do not use a base capacity + spot overflow strategy within a single ASG.

Configure on GKE

GKE's Cluster Autoscaler is a fully managed component running in the control plane. You do not deploy the open-source CA.

Enable on a new cluster

gcloud container clusters create production-cluster \
  --num-nodes=2 \
  --location=europe-west4-a \
  --node-locations=europe-west4-a,europe-west4-b \
  --enable-autoscaling \
  --min-nodes=1 \
  --max-nodes=8

Enable on an existing node pool

gcloud container clusters update production-cluster \
  --enable-autoscaling \
  --min-nodes=1 \
  --max-nodes=8 \
  --node-pool=default-pool

For multi-zone node pools on GKE 1.24+, use --total-min-nodes and --total-max-nodes to control the count across all zones rather than per zone.

Choose an autoscaling profile

GKE exposes two autoscaling profiles:

Profile Behavior
balanced (default) Moderate scale-down, keeps some buffer for incoming workloads
optimize-utilization Aggressive scale-down for cost savings, may increase scheduling latency
gcloud container clusters update production-cluster \
  --autoscaling-profile=optimize-utilization

GKE limitations

  • Cannot scale to zero on any node pool
  • Maximum cluster size: 15,000 nodes
  • Node auto-provisioning (NAP) is a separate feature that creates new node pools dynamically; it is distinct from the autoscaler managing existing pools

Configure on AKS

AKS manages CA internally. Configure it through az CLI, never by editing VMSS autoscaling settings in the Azure portal (they conflict).

Enable on a new cluster

az aks create \
  --resource-group production-rg \
  --name production-aks \
  --node-count 2 \
  --vm-set-type VirtualMachineScaleSets \
  --load-balancer-sku standard \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 8 \
  --generate-ssh-keys

Update autoscaler on an existing node pool

az aks nodepool update \
  --resource-group production-rg \
  --cluster-name production-aks \
  --name nodepool1 \
  --update-cluster-autoscaler \
  --min-count 1 \
  --max-count 10

Configure the autoscaler profile

AKS exposes a cluster-wide autoscaler profile that applies to all node pools:

az aks update \
  --resource-group production-rg \
  --name production-aks \
  --cluster-autoscaler-profile \
    scan-interval=30s \
    scale-down-delay-after-add=5m \
    scale-down-unneeded-time=5m \
    scale-down-utilization-threshold=0.5 \
    max-graceful-termination-sec=300 \
    balance-similar-node-groups=true

The profile cannot be set per node pool.

Choose an expander strategy

When multiple node groups can satisfy a pending pod, the expander selects which group to expand:

Expander Behavior Best for
random (default) Picks a random eligible group Equivalent groups, simple setups
least-waste Picks the group leaving the least unused CPU/memory General-purpose cost efficiency
most-pods Picks the group that fits the most pods Burst workloads
priority User-defined ranking via ConfigMap Spot before on-demand, preferred instance types

Expanders can be chained: --expander=priority,least-waste applies priority rules first, then breaks ties by least waste.

Priority ConfigMap example (prefer Spot over on-demand):

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |-
    10:
      - .*spot.*
    50:
      - .*ondemand.*

Higher numbers are tried first. Regex patterns match node group names.

Tune scale-down timing

The most impactful flags for cost optimization:

Flag Default What it controls
--scale-down-unneeded-time 10m How long a node must be underutilized before removal
--scale-down-delay-after-add 10m Cooldown after any scale-up event
--scale-down-delay-after-delete scan-interval Cooldown after a node deletion
--scale-down-delay-after-failure 3m Cooldown after a failed scale-down attempt
--scale-down-utilization-threshold 0.5 Utilization below which a node is "underutilized"

For development clusters where cost matters more than stability:

--scale-down-unneeded-time=3m \
--scale-down-delay-after-add=2m \
--scale-down-utilization-threshold=0.4

For production clusters, keep the defaults or increase them. The 10-minute buffers exist because premature scale-down followed by immediate scale-up is more expensive (node boot time, pod migration, potential service disruption) than keeping an idle node for a few extra minutes.

Diagnose why a node will not scale down

CA evaluates nodes for removal every scan-interval. If a node stays despite low utilization, one of these is blocking it:

PodDisruptionBudgets at zero

kubectl get pdb --all-namespaces -o wide
# Look for ALLOWED DISRUPTIONS = 0

A PDB with maxUnavailable: 0 or minAvailable equal to the total replica count blocks eviction on that node. Fix by using percentage-based PDBs (minAvailable: 75%) or by increasing replica counts.

Pods with local storage

By default, pods using emptyDir or hostPath volumes block eviction because data would be lost. Set --skip-nodes-with-local-storage=false globally, or annotate specific pods:

metadata:
  annotations:
    # CA 1.26+ — mark specific volumes as safe to evict
    cluster-autoscaler.kubernetes.io/safe-to-evict-local-volumes: "cache-volume,tmp-dir"

System pods in kube-system

--skip-nodes-with-system-pods=true (default on most providers) prevents removing nodes running kube-system pods other than DaemonSets. Add-ons like metrics-server or coredns can trap nodes. Either set the flag to false or annotate individual system pods:

metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

The safe-to-evict annotation

Any pod annotated with cluster-autoscaler.kubernetes.io/safe-to-evict: "false" blocks scale-down of its node. This is appropriate for the CA pod itself and for long-running batch jobs with expensive restart costs.

Node affinity constraints

If pods on a node have requiredDuringSchedulingIgnoredDuringExecution affinity rules that no other node satisfies, the node cannot be drained. Prefer preferredDuringSchedulingIgnoredDuringExecution where possible.

Timing delays

Both scale-down-delay-after-add and scale-down-unneeded-time must have elapsed. After a scale-up event, CA waits the full delay-after-add period before evaluating any node for removal, regardless of how underutilized it is.

Diagnose why a Pending pod does not trigger scale-up

A pod stuck in Pending does not guarantee CA will act. CA only helps if adding a node from an existing node group would allow the pod to schedule. Check these causes in order:

  1. No node group matches the pod's constraints. If the pod has a nodeSelector, nodeAffinity, or taint toleration that no node group provides, CA cannot help. Check CA events:
kubectl get events --field-selector source=cluster-autoscaler,reason=NotTriggerScaleUp
  1. Node group maximum reached. The ASG or VMSS max count is a hard ceiling. CA logs this as max node group size reached.

  2. Scale-from-zero without template tags. A node group at zero nodes needs ASG tags describing its labels, taints, and resources. Without them, CA cannot evaluate whether scaling the group would help.

  3. Too many unready nodes. If more than max-total-unready-percentage (default 45%) of nodes are NotReady, CA halts all operations.

  4. new-pod-scale-up-delay is set. Pods younger than this value are ignored. Useful for bursty workloads where the scheduler handles spikes before CA needs to intervene.

Balance similar node groups across zones

--balance-similar-node-groups=true makes CA distribute nodes evenly across node groups that have matching instance types, labels, and taints. This is the recommended setting for multi-AZ clusters.

Without it, CA picks a single group based on the expander strategy, which can concentrate all new nodes in one availability zone. During a zone failure, that concentration becomes a single point of failure.

On AKS, set it in the autoscaler profile:

az aks update --resource-group production-rg --name production-aks \
  --cluster-autoscaler-profile balance-similar-node-groups=true

On GKE, this behavior is managed internally by the autoscaler.

Observe and debug Cluster Autoscaler

Status ConfigMap

kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml

Reports the last scale-up and scale-down times, node group states, and overall health.

Logs

kubectl logs -n kube-system deployment/cluster-autoscaler --tail=50

Key log patterns:

Pattern Meaning
Scale-up: no suitable node group found Pod constraints unsatisfiable by any group
pod has local storage Local storage blocking scale-down
pod has PDB PDB blocking scale-down
Node X was unneeded for X min Countdown to scale-down
Scale-down: removing node X Active node removal

Events

# Scale-up decisions
kubectl get events --field-selector source=cluster-autoscaler,reason=ScaleUp

# Scale-up not triggered (constraint mismatches)
kubectl get events --field-selector source=cluster-autoscaler,reason=NotTriggerScaleUp

# Warnings
kubectl get events --field-selector source=cluster-autoscaler,type=Warning

EKS: verify API access

kubectl exec -n kube-system -it deployment/cluster-autoscaler -- \
  aws autoscaling describe-auto-scaling-groups --region eu-west-1 --output text --query 'length(AutoScalingGroups)'

If this fails, IRSA or Pod Identity is misconfigured.

AKS: enable control plane logs

Enable the cluster-autoscaler category under AKS control plane resource logs and query them in Log Analytics.

When to use Karpenter instead

Karpenter is a fundamentally different approach to node autoscaling. Where CA works with predefined node groups and cloud-provider scaling APIs, Karpenter calls the EC2 Fleet API directly, picking the best instance type per pending pod batch.

Dimension Cluster Autoscaler Karpenter
Provisioning speed 3-5 min (ASG spin-up) 45-60 s (EC2 Fleet direct)
Instance selection Predefined per node group Any instance matching NodePool requirements
Consolidation Removes idle nodes only Empty, multi-node, and single-node consolidation
Cloud support 24+ cloud providers EKS only (as of early 2026)

Choose CA when: you run GKE or AKS (where Karpenter is not available), you need strict AMI control per node group, you operate in regulated environments with predefined instance pools, or you run a multi-cloud setup.

Choose Karpenter when: you run EKS and want faster provisioning, better bin-packing, automated consolidation, and Spot diversification without managing ASGs. AWS recommends Karpenter for new EKS clusters as of 2025.

Both can run simultaneously during migration. Deploy Karpenter alongside CA, then gradually scale CA down as workloads shift to Karpenter-provisioned nodes.

When to escalate

Collect this information before asking for help:

  • CA logs: kubectl logs -n kube-system deployment/cluster-autoscaler --tail=100
  • Status ConfigMap: kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml
  • PDB status: kubectl get pdb --all-namespaces -o wide
  • Pending pods and their events: kubectl get pods --field-selector=status.phase=Pending -A and kubectl describe pod <name>
  • CA events: kubectl get events --field-selector source=cluster-autoscaler
  • Node group configuration (ASG tags, VMSS settings, GKE node pool config)
  • CA version and Kubernetes version (kubectl version --short)
  • Cloud provider and region

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy

Search this site

Start typing to search, or browse the knowledge base and blog.