KEDA: event-driven autoscaling for Kubernetes

KEDA (Kubernetes Event-Driven Autoscaling) extends the standard HPA with 70+ built-in scalers and scale-to-zero support. Instead of scaling only on CPU and memory, you can scale pods based on Kafka consumer lag, RabbitMQ queue depth, Prometheus query results, or a cron schedule. This tutorial walks through installing KEDA, creating your first ScaledObject, configuring common scalers, and understanding the two-phase scaling model that makes zero-to-many possible.

Learning goal
Prerequisites
Why HPA alone falls short
How KEDA works
Install KEDA with Helm
Create your first ScaledObject
Common scalers
Scale to zero
How KEDA and HPA interact
Gotchas that bite in production
What you learned
Where to go next

Learning goal

By the end of this tutorial you will have KEDA running on a Kubernetes cluster, a working ScaledObject that scales a Deployment based on an external event source, and a clear mental model of how KEDA's two-phase activation fits together with the native HPA. You will also know how to configure the Kafka, Prometheus, and Cron scalers, and how the scale-to-zero mechanism works mechanically.

Prerequisites

kubectl connected to a Kubernetes 1.30+ cluster
Helm 3.x installed
Cluster-admin permissions (KEDA installs CRDs and admission webhooks)
Familiarity with the Horizontal Pod Autoscaler. You do not need a running HPA yet; KEDA creates one for you. But understanding the HPA scaling algorithm (the ratio formula, stabilization windows, behaviour policies) will make every section here click faster
A Deployment to scale. The examples use a fictional order-processor Deployment, but any workload works
For the Prometheus scaler section: Prometheus running in-cluster with a reachable query endpoint

Why HPA alone falls short

The native HPA covers CPU and memory well. For stateless HTTP services that correlate neatly with CPU utilization, it is often enough. The gap shows up when your scaling signal lives outside the pod.

A Kafka consumer sitting idle at 2% CPU while 50,000 messages pile up in a topic will never trigger an HPA scale-out, because CPU says everything is fine. The same applies to RabbitMQ queue depth, Prometheus business metrics, SQS message counts, or any external signal. Building a custom metrics adapter to bridge these signals into the HPA is possible, but it is a significant operational burden that you have to build, deploy, and maintain yourself.

HPA also cannot scale to zero replicas. Its minReplicas floor is 1 because it needs a running pod to generate utilization metrics. For event-driven workloads that sit idle most of the day, that one always-on pod is wasted cost.

Capability	HPA alone	KEDA + HPA
Scale to zero	No	Yes
CPU/memory metrics	Yes	Yes
External metrics (queues, streams)	Requires custom adapter	70+ built-in scalers
Cron-based scheduling	No	Yes
Event-driven job creation	No	Yes (ScaledJob)

KEDA fills these gaps without replacing HPA. It builds on top of it.

How KEDA works

KEDA is a CNCF Graduated project, co-created by Microsoft and Red Hat. It installs three pods in the keda namespace:

KEDA Operator. A controller that watches ScaledObject and ScaledJob custom resources. When you create a ScaledObject, the operator creates a corresponding HPA (keda-hpa-{name}) and manages its lifecycle. The operator also handles the 0-to-1 activation and 1-to-0 deactivation phases directly, because HPA cannot operate in that range.

Metrics Server. Implements the Kubernetes External Metrics API (external.metrics.k8s.io). It translates raw scaler output into metric values that the HPA controller can consume. Without this component, HPA has no way to read Kafka lag or RabbitMQ queue depth.

Admission Webhooks. Validate ScaledObject and ScaledJob configurations at creation time. They prevent conflicts like two ScaledObjects targeting the same Deployment.

Each scaler implements two methods: IsActive() (used for the 0-to-1 decision) and GetMetrics() (used for 1-to-N scaling via HPA). As of KEDA v2.19, there are 70+ built-in scalers covering Kafka, RabbitMQ, Prometheus, SQS, Pub/Sub, Redis Streams, PostgreSQL, Datadog, Cron, and more.

Install KEDA with Helm

Step 1: add the KEDA Helm repository

helm repo add kedacore https://kedacore.github.io/charts
helm repo update

Step 2: install KEDA

Pin the chart version. Unpinned installs in production are a recipe for surprise breaking changes.

helm install keda kedacore/keda \
  --version 2.19.0 \
  --namespace keda \
  --create-namespace

Step 3: verify the installation

kubectl get pods -n keda

You should see three pods in Running state:

NAME                                      READY   STATUS    RESTARTS   AGE
keda-admission-webhooks-...               1/1     Running   0          45s
keda-operator-...                         1/1     Running   0          45s
keda-operator-metrics-apiserver-...       1/1     Running   0          45s

Confirm the CRDs are registered:

kubectl get crd | grep keda

Expected output includes scaledobjects.keda.sh, scaledjobs.keda.sh, triggerauthentications.keda.sh, and clustertriggerauthentications.keda.sh.

Checkpoint: if any pod is not Running, check its logs with kubectl logs -n keda -l app.kubernetes.io/instance=keda. The most common cause is an admission webhook timeout in clusters with strict network policies.

Create your first ScaledObject

A ScaledObject links a Deployment to one or more event sources. Here is a minimal example that scales an order-processor Deployment based on a RabbitMQ queue:

# scaledobject-orders.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: order-processor         # must match the Deployment name
  pollingInterval: 15              # check the queue every 15 seconds
  cooldownPeriod: 120              # wait 2 min of idle before scaling to zero
  minReplicaCount: 0               # enable scale-to-zero
  maxReplicaCount: 30
  triggers:
    - type: rabbitmq
      metadata:
        queueName: incoming-orders
        mode: QueueLength
        value: "10"                # target: 1 replica per 10 messages
        activationValue: "1"       # activate from zero when >= 1 message
      authenticationRef:
        name: rabbitmq-trigger-auth

Apply and verify:

kubectl apply -f scaledobject-orders.yaml

# Check that KEDA created an HPA
kubectl get hpa
# NAME                              REFERENCE                    TARGETS          MINPODS   MAXPODS   REPLICAS
# keda-hpa-order-processor-scaler   Deployment/order-processor   0/10 (avg)       1         30        0

The TARGETS column shows the current metric value vs. the target. When messages arrive in the queue, KEDA activates the Deployment from zero and the HPA scales from 1 to N based on queue depth.

Checkpoint: run kubectl describe scaledobject order-processor-scaler and look for Conditions. Ready: True and Active: True (or False if the queue is empty) confirm KEDA is polling successfully.

Common scalers

Kafka: scale on consumer lag

The Kafka scaler reads consumer group lag (the difference between latest offset and committed offset). Scaling math: if total lag is 500 and lagThreshold is 50, KEDA targets 500 / 50 = 10 replicas. By default, replicas are capped at the partition count unless allowIdleConsumers: "true".

triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka-broker-1.kafka:9092,kafka-broker-2.kafka:9092
      consumerGroup: order-consumer-group
      topic: orders
      lagThreshold: "50"               # 1 replica per 50 lag
      activationLagThreshold: "1"      # activate from zero on any lag
      offsetResetPolicy: latest
    authenticationRef:
      name: kafka-trigger-auth         # TriggerAuthentication with SASL credentials

Why activationLagThreshold matters: without it (default 0), KEDA activates from zero on the very first message. If your topic receives sporadic single-event bursts that do not justify cold-starting a consumer, set this higher.

Prometheus: scale on any PromQL query

The Prometheus scaler lets you scale on anything Prometheus can measure: HTTP request rate, p95 latency, custom business metrics, queue depth exposed via exporters.

triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-operated.monitoring.svc:9090
      query: sum(rate(http_requests_total{deployment="api-gateway"}[2m]))
      threshold: "100"                 # 1 replica per 100 req/s
      activationThreshold: "5"         # activate from zero above 5 req/s
      ignoreNullValues: "true"         # don't error on empty Prometheus response

The query must return a single scalar or a single-element vector. Multi-element results cause the scaler to error. Test your PromQL in the Prometheus UI first.

Cron: guaranteed capacity during scheduled windows

The Cron scaler sets a replica floor during a time window. It pairs well with event scalers: Cron guarantees a warm baseline during business hours while Kafka or Prometheus handle burst scaling on top.

triggers:
  - type: cron
    metadata:
      timezone: Europe/Amsterdam
      start: "0 8 * * 1-5"            # weekdays 08:00
      end: "0 18 * * 1-5"             # weekdays 18:00
      desiredReplicas: "5"             # floor of 5 during business hours
  - type: kafka
    metadata:
      bootstrapServers: kafka:9092
      consumerGroup: order-consumer-group
      topic: orders
      lagThreshold: "50"

During business hours, KEDA ensures at least 5 replicas. Kafka lag can push beyond 5. Outside the Cron window, the Kafka scaler governs alone, including scaling to zero when the topic is idle.

Scale to zero

This is the feature that most clearly separates KEDA from plain HPA. The mechanism works in two phases.

Activation (0 to 1). When minReplicaCount: 0 and current replicas are 0, the KEDA operator (not the HPA) polls the event source every pollingInterval seconds and calls IsActive(). When the metric value strictly exceeds activationThreshold, the operator sets the Deployment to 1 replica. The HPA takes over from there.

Deactivation (1 to 0). When all triggers report inactive (metric at or below threshold), the operator starts the cooldownPeriod countdown. After cooldownPeriod seconds of sustained inactivity, KEDA sets replicas to 0 directly, bypassing HPA (which cannot go below minReplicas: 1).

A critical gotcha: activationThreshold has priority over threshold. If you set activationThreshold: 50 and threshold: 10, and 40 messages are in the queue, the scaler stays inactive. The workload will not activate even though the HPA math would call for 4 replicas. This is intentional. It lets you prevent cold-starts on transient spikes. But it bites hard if you set it without understanding this precedence.

Cold-start cost. Pods activated from zero must start up before they can process events. For workloads where cold-start latency is unacceptable, set minReplicaCount: 1 to always keep one warm replica. You lose the cost savings of true zero, but you avoid the startup penalty.

How KEDA and HPA interact

KEDA does not replace HPA. It creates and feeds one.

When you create a ScaledObject, KEDA's operator creates an HPA named keda-hpa-{scaledobject-name}. The HPA's metrics section references external metrics served by KEDA's metrics server. Every 15 seconds (the default --horizontal-pod-autoscaler-sync-period), the HPA controller queries the metrics server and applies its standard formula: desiredReplicas = ceil(currentReplicas * (currentValue / targetValue)).

Do not manually edit the HPA that KEDA creates. Changes are overwritten on the next reconciliation cycle. To configure HPA behaviour policies (stabilization windows, scale-down rate limits), use the horizontalPodAutoscalerConfig.behavior field in the ScaledObject:

horizontalPodAutoscalerConfig:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 30                    # remove max 30% per period
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0    # react immediately

One important distinction: cooldownPeriod governs only the 1-to-0 transition. For N-to-1 scale-down behavior, the HPA's scaleDown.stabilizationWindowSeconds is in control. These are separate mechanisms with separate timers.

Gotchas that bite in production

Scale-down kills pods mid-processing. KEDA monitors queue depth. If a message is dequeued but not yet acknowledged, the queue appears shorter. KEDA may trigger scale-down and terminate the pod before it finishes. Mitigations: for RabbitMQ, use excludeUnacknowledged: "true" (HTTP protocol mode). For Kafka, use excludePersistentLag: "true". Always configure terminationGracePeriodSeconds and preStop lifecycle hooks. For long-running batch tasks, use ScaledJob instead of ScaledObject; it creates a Kubernetes Job per event with its own completion lifecycle.

KEDA polling + HPA polling = double latency. KEDA polls every pollingInterval seconds. The HPA controller evaluates every 15 seconds. In the worst case, there is pollingInterval + 15 seconds between an event arriving and scaling starting. Set pollingInterval: 10 or lower for latency-sensitive workloads, but watch the load on your event source.

Admission webhook blocks ScaledObject creation. In clusters with strict network policies, the webhook pod may be unreachable. If kubectl apply hangs for 30 seconds and then fails with a webhook timeout, check that the keda-admission-webhooks pod is running and reachable from the API server.

Default RBAC is broad. KEDA's Helm chart installs permissive RBAC by default (get/list/watch/scale on all resources). For production, restrict using the rbac.scaledRefKinds value in the Helm chart to limit which resource kinds KEDA can target.

What you learned

You installed KEDA from the Helm chart and verified its three components (operator, metrics server, admission webhooks). You created a ScaledObject that links a Deployment to an event source and saw KEDA automatically create an HPA behind the scenes. You configured three scaler types: Kafka for consumer lag, Prometheus for arbitrary PromQL queries, and Cron for scheduled capacity floors. And you understand the two-phase scaling model: KEDA handles the 0-to-1 activation and 1-to-0 deactivation; the HPA handles everything in between.

Where to go next

Horizontal Pod Autoscaler deep-dive covers the HPA layer that KEDA creates under the hood: the scaling algorithm, stabilization windows, and combining CPU with custom metrics
Kubernetes Jobs and CronJobs explains the Job resource that KEDA's ScaledJob builds on for batch processing
Resource requests and limits covers right-sizing the pods that KEDA scales, so you do not hit OOMKills or scheduling failures when replicas surge

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy