Kubernetes spot and preemptible instances: cost savings with interruption safety

Spot instances on AWS and preemptible VMs on GCP cost 60–80% less than on-demand, but the cloud provider can reclaim them with as little as 30 seconds notice. Running Kubernetes workloads on spot safely requires interruption handlers, PodDisruptionBudgets, proper taints, and diversified instance pools. This guide walks through each layer of the setup on both EKS and GKE.

What you will have at the end

A Kubernetes cluster where non-critical workloads run on spot or preemptible nodes at 60–80% cost reduction, with interruption handlers that drain pods gracefully before reclamation, PodDisruptionBudgets that protect availability, and instance type diversification that keeps interruption rates below 5%.

Prerequisites

  • kubectl connected to an EKS (1.28+) or GKE (1.25+) cluster
  • helm installed locally for handler deployments
  • Familiarity with Karpenter NodePool configuration (for the Karpenter path) or Cluster Autoscaler (for the managed node group path)
  • PodDisruptionBudgets configured on production workloads
  • IAM permissions to create SQS queues and EventBridge rules (AWS) or node pool management permissions (GCP)

How interruption notices work

The cloud provider decides to reclaim your spot capacity. What happens next depends on the provider.

AWS: 2-minute warning via IMDS

AWS emits a Spot Instance interruption notice exactly 2 minutes before terminating or stopping the instance. The notice is available at the IMDS endpoint:

http://169.254.169.254/latest/meta-data/spot/instance-action

When an interruption is scheduled, the endpoint returns:

{"action": "terminate", "time": "2026-04-09T14:22:00Z"}

When nothing is pending, it returns HTTP 404. That 404 is the normal state.

AWS also sends a separate rebalance recommendation when a spot instance is at elevated risk. This arrives before the 2-minute notice (sometimes significantly earlier) and gives you time to proactively move workloads before actual reclamation.

GCP: 30-second ACPI signal

GCP sends an ACPI G2 Soft Off signal giving the VM up to 30 seconds to shut down gracefully. If the instance has not stopped within that window, GCP sends a hard kill signal.

That 30-second window is tight. Plan for it in your terminationGracePeriodSeconds.

GCP terminology note: preemptible VMs have a 24-hour maximum runtime and Google recommends migrating to Spot VMs, which have no inherent runtime cap. Both share the same 30-second interruption window.

Step 1: taint spot nodes and configure tolerations

Spot nodes need a taint to prevent non-spot-tolerant workloads from landing on them. Without the taint, your database primary could end up on a spot node.

On EKS with Karpenter, add the taint in the NodePool spec (covered in Step 3).

On GKE, apply the taint when creating the spot node pool:

gcloud container node-pools create spot-pool \
  --cluster=production-main \
  --spot \
  --node-taints=cloud.google.com/gke-spot="true":NoSchedule

GKE automatically labels spot nodes with cloud.google.com/gke-spot: "true" and cloud.google.com/gke-provisioning: "spot" (GKE 1.25.5+).

Workloads that should run on spot need a matching toleration:

# For GKE spot nodes
tolerations:
  - key: cloud.google.com/gke-spot
    operator: Equal
    value: "true"
    effect: NoSchedule
# For Karpenter-managed spot nodes (custom taint)
tolerations:
  - key: "spot"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"

Step 2: install an interruption handler

Something needs to watch for the cloud provider's reclamation signal and drain pods before the node disappears. The right tool depends on your setup.

If you already run Karpenter, you do not need a separate handler. Karpenter has handled Spot interruption notifications natively since v0.19.0 via an SQS queue and EventBridge.

When an interruption arrives:

  1. EventBridge forwards the event to the SQS queue
  2. Karpenter's interruption controller taints the node NoSchedule and begins draining pods
  3. In parallel, Karpenter provisions a replacement node from the NodePool
  4. Replacement is typically ready before the 2-minute window expires

Enable it by passing the queue name to Karpenter:

karpenter --interruption-queue=karpenter-interruption-queue

One limitation: Karpenter publishes events for Spot rebalance recommendations but does not act on them proactively. If you want rebalance-triggered replacement, you still need NTH in Queue mode alongside Karpenter.

Option B: aws-node-termination-handler (EKS, without Karpenter)

The aws-node-termination-handler (NTH) runs in two mutually exclusive modes:

IMDS mode (DaemonSet): polls the IMDS endpoint every 5 seconds. No extra AWS infrastructure required. Best for simple spot setups.

helm install aws-node-termination-handler \
  eks/aws-node-termination-handler \
  --set enableSpotInterruptionDraining=true \
  --set enableRebalanceMonitoring=true \
  --set enableScheduledEventDraining=true

Queue mode (Deployment): monitors an SQS queue fed by EventBridge. Supports ASG lifecycle hooks with grace periods up to 48 hours via RecordLifecycleActionHeartbeat. Required for long-running batch jobs.

helm install aws-node-termination-handler \
  eks/aws-node-termination-handler \
  --set enableSqsTerminationDraining=true \
  --set queueURL=https://sqs.eu-west-1.amazonaws.com/123456789012/nth-queue
Factor IMDS mode Queue mode
Infrastructure needed None SQS + EventBridge + IAM
ASG lifecycle hooks No Yes (up to 48h)
Long-running batch jobs No Yes
Simple spot nodes Recommended Overkill

Option C: GKE kubelet graceful shutdown

GKE 1.20+ handles preemption automatically. The kubelet intercepts the ACPI signal and gracefully terminates pods. No separate handler needed.

For GKE clusters older than 1.20, deploy the k8s-node-termination-handler from GoogleCloudPlatform.

Step 3: configure Karpenter for spot (EKS)

This NodePool tells Karpenter to prefer spot, fall back to on-demand when capacity is unavailable, and diversify across instance types to reduce interruption risk:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-workloads
spec:
  template:
    spec:
      taints:
        - key: "spot"
          value: "true"
          effect: NoSchedule
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]   # spot preferred, on-demand fallback
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - m5.xlarge        # diversify across families and generations
            - m5a.xlarge
            - m6i.xlarge
            - m6a.xlarge
            - m5d.xlarge
            - m5n.xlarge
            - m4.xlarge
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
    budgets:
      - nodes: "20%"
        reasons: ["Empty", "Drifted"]
      - nodes: "5"            # max 5 nodes disrupted at once for other reasons

Karpenter prioritizes capacity types in order: reserved > spot > on-demand. When spot capacity is unavailable, Karpenter caches the InsufficientCapacityError for 3 minutes and falls back to on-demand.

SpotToSpotConsolidation

Karpenter can replace a running spot node with a cheaper spot node. Enable it via Helm:

helm upgrade karpenter oci://public.ecr.aws/karpenter/karpenter \
  --set settings.featureGates.spotToSpotConsolidation=true

This triggers only when 15+ instance type options with lower pricing exist, preventing convergence on a single high-interruption type. It uses the price-capacity-optimized strategy. I would recommend validating consolidation patterns in a staging environment first.

Step 4: set PDBs and graceful termination periods

Spot interruptions trigger the Eviction API, which respects PodDisruptionBudgets. A well-configured PDB prevents all replicas from being evicted simultaneously.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: web
  unhealthyPodEvictionPolicy: AlwaysAllow  # prevents stalled drains

Set unhealthyPodEvictionPolicy: AlwaysAllow so unhealthy pods do not block eviction when the PDB budget is already consumed.

Termination grace periods

The terminationGracePeriodSeconds covers both the preStop hook and the container's SIGTERM handler. Match it to your cloud provider's interruption window:

  • AWS (2-minute window): set terminationGracePeriodSeconds: 90 for most workloads. Batch jobs on Queue mode NTH with lifecycle hooks can go higher.
  • GCP (30-second window): set terminationGracePeriodSeconds: 25 to stay within the hard cutoff.

A minimal preStop hook that allows load balancer deregistration:

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 5"]   # wait for endpoint removal

Production services behind load balancers should verify deregistration is complete instead of relying on a fixed sleep.

Minimum replicas

Run at least 2 replicas of any spot-tolerant Deployment. A single replica on a spot node means a full outage on interruption. No PDB saves you from that.

Step 5: select instance types for low interruption rates

Interruption is not random. The AWS Spot Instance Advisor shows per-type interruption frequency in bands: <5%, 5–10%, 10–15%, 15–20%, >20%. Across all types, the historical average is below 5%.

Three rules for instance type selection:

  1. Diversify across 5–10 types of the same vCPU and memory class (e.g., m5.xlarge, m5a.xlarge, m6i.xlarge, m6a.xlarge). The more pools Karpenter or the ASG can draw from, the lower the chance all pools are exhausted simultaneously.
  2. Span multiple Availability Zones. The same instance type can have very different interruption rates across AZs.
  3. Use capacity-optimized or price-capacity-optimized allocation. These strategies select from the deepest capacity pools rather than the cheapest price alone.

The spotinfo CLI lets you browse interruption rates and savings percentages per instance type and region from the terminal.

Verify the setup

After deploying, confirm each layer works:

# Check spot nodes are running and tainted
kubectl get nodes -l karpenter.sh/capacity-type=spot -o wide
kubectl describe node <spot-node> | grep -A2 Taints

# Check NTH or Karpenter interruption controller is running
kubectl get pods -n kube-system | grep -E 'node-termination|karpenter'

# Check PDBs are in place
kubectl get pdb --all-namespaces

Expected state: spot nodes carry the correct taint, the interruption handler pod is Running, and PDBs show disruptionsAllowed > 0 for workloads with multiple replicas.

Workload suitability checklist

Not every workload belongs on spot. Use this checklist:

  • Good fit: stateless web servers, API workers, CI/CD runners, batch data processing pipelines, non-production environments, ML training with checkpointing
  • Possible with care: StatefulSets with persistent volumes when the application supports graceful checkpoint/restore (Kafka, Redis with RDB snapshots, distributed ML training)
  • Not a fit: single-node databases (PostgreSQL, MySQL in primary mode), workloads that cannot tolerate a 2-minute (AWS) or 30-second (GCP) restart window, latency-sensitive services with no on-demand fallback

The rule: if your application can survive a process restart with zero data loss, it can run on spot.

When to escalate

Collect the following before reaching out:

  • kubectl get events --field-selector reason=Eviction -A output
  • kubectl describe node <affected-node> (look for conditions and taints)
  • NTH or Karpenter controller logs: kubectl logs -n kube-system deploy/aws-node-termination-handler or kubectl logs -n kube-system deploy/karpenter
  • The PDB status: kubectl get pdb -A -o wide
  • Instance type and AZ of the interrupted node
  • Whether workloads restarted successfully on replacement nodes

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy

Search this site

Start typing to search, or browse the knowledge base and blog.