Table of contents
- Learning goal
- Prerequisites
- Why HPA alone falls short
- How KEDA works
- Install KEDA with Helm
- Create your first ScaledObject
- Common scalers
- Scale to zero
- How KEDA and HPA interact
- Gotchas that bite in production
- What you learned
- Where to go next
Learning goal
By the end of this tutorial you will have KEDA running on a Kubernetes cluster, a working ScaledObject that scales a Deployment based on an external event source, and a clear mental model of how KEDA's two-phase activation fits together with the native HPA. You will also know how to configure the Kafka, Prometheus, and Cron scalers, and how the scale-to-zero mechanism works mechanically.
Prerequisites
kubectlconnected to a Kubernetes 1.30+ cluster- Helm 3.x installed
- Cluster-admin permissions (KEDA installs CRDs and admission webhooks)
- Familiarity with the Horizontal Pod Autoscaler. You do not need a running HPA yet; KEDA creates one for you. But understanding the HPA scaling algorithm (the ratio formula, stabilization windows, behaviour policies) will make every section here click faster
- A Deployment to scale. The examples use a fictional
order-processorDeployment, but any workload works - For the Prometheus scaler section: Prometheus running in-cluster with a reachable query endpoint
Why HPA alone falls short
The native HPA covers CPU and memory well. For stateless HTTP services that correlate neatly with CPU utilization, it is often enough. The gap shows up when your scaling signal lives outside the pod.
A Kafka consumer sitting idle at 2% CPU while 50,000 messages pile up in a topic will never trigger an HPA scale-out, because CPU says everything is fine. The same applies to RabbitMQ queue depth, Prometheus business metrics, SQS message counts, or any external signal. Building a custom metrics adapter to bridge these signals into the HPA is possible, but it is a significant operational burden that you have to build, deploy, and maintain yourself.
HPA also cannot scale to zero replicas. Its minReplicas floor is 1 because it needs a running pod to generate utilization metrics. For event-driven workloads that sit idle most of the day, that one always-on pod is wasted cost.
| Capability | HPA alone | KEDA + HPA |
|---|---|---|
| Scale to zero | No | Yes |
| CPU/memory metrics | Yes | Yes |
| External metrics (queues, streams) | Requires custom adapter | 70+ built-in scalers |
| Cron-based scheduling | No | Yes |
| Event-driven job creation | No | Yes (ScaledJob) |
KEDA fills these gaps without replacing HPA. It builds on top of it.
How KEDA works
KEDA is a CNCF Graduated project, co-created by Microsoft and Red Hat. It installs three pods in the keda namespace:
KEDA Operator. A controller that watches ScaledObject and ScaledJob custom resources. When you create a ScaledObject, the operator creates a corresponding HPA (keda-hpa-{name}) and manages its lifecycle. The operator also handles the 0-to-1 activation and 1-to-0 deactivation phases directly, because HPA cannot operate in that range.
Metrics Server. Implements the Kubernetes External Metrics API (external.metrics.k8s.io). It translates raw scaler output into metric values that the HPA controller can consume. Without this component, HPA has no way to read Kafka lag or RabbitMQ queue depth.
Admission Webhooks. Validate ScaledObject and ScaledJob configurations at creation time. They prevent conflicts like two ScaledObjects targeting the same Deployment.
Each scaler implements two methods: IsActive() (used for the 0-to-1 decision) and GetMetrics() (used for 1-to-N scaling via HPA). As of KEDA v2.19, there are 70+ built-in scalers covering Kafka, RabbitMQ, Prometheus, SQS, Pub/Sub, Redis Streams, PostgreSQL, Datadog, Cron, and more.
Install KEDA with Helm
Step 1: add the KEDA Helm repository
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
Step 2: install KEDA
Pin the chart version. Unpinned installs in production are a recipe for surprise breaking changes.
helm install keda kedacore/keda \
--version 2.19.0 \
--namespace keda \
--create-namespace
Step 3: verify the installation
kubectl get pods -n keda
You should see three pods in Running state:
NAME READY STATUS RESTARTS AGE
keda-admission-webhooks-... 1/1 Running 0 45s
keda-operator-... 1/1 Running 0 45s
keda-operator-metrics-apiserver-... 1/1 Running 0 45s
Confirm the CRDs are registered:
kubectl get crd | grep keda
Expected output includes scaledobjects.keda.sh, scaledjobs.keda.sh, triggerauthentications.keda.sh, and clustertriggerauthentications.keda.sh.
Checkpoint: if any pod is not Running, check its logs with kubectl logs -n keda -l app.kubernetes.io/instance=keda. The most common cause is an admission webhook timeout in clusters with strict network policies.
Create your first ScaledObject
A ScaledObject links a Deployment to one or more event sources. Here is a minimal example that scales an order-processor Deployment based on a RabbitMQ queue:
# scaledobject-orders.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-processor-scaler
namespace: default
spec:
scaleTargetRef:
name: order-processor # must match the Deployment name
pollingInterval: 15 # check the queue every 15 seconds
cooldownPeriod: 120 # wait 2 min of idle before scaling to zero
minReplicaCount: 0 # enable scale-to-zero
maxReplicaCount: 30
triggers:
- type: rabbitmq
metadata:
queueName: incoming-orders
mode: QueueLength
value: "10" # target: 1 replica per 10 messages
activationValue: "1" # activate from zero when >= 1 message
authenticationRef:
name: rabbitmq-trigger-auth
Apply and verify:
kubectl apply -f scaledobject-orders.yaml
# Check that KEDA created an HPA
kubectl get hpa
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
# keda-hpa-order-processor-scaler Deployment/order-processor 0/10 (avg) 1 30 0
The TARGETS column shows the current metric value vs. the target. When messages arrive in the queue, KEDA activates the Deployment from zero and the HPA scales from 1 to N based on queue depth.
Checkpoint: run kubectl describe scaledobject order-processor-scaler and look for Conditions. Ready: True and Active: True (or False if the queue is empty) confirm KEDA is polling successfully.
Common scalers
Kafka: scale on consumer lag
The Kafka scaler reads consumer group lag (the difference between latest offset and committed offset). Scaling math: if total lag is 500 and lagThreshold is 50, KEDA targets 500 / 50 = 10 replicas. By default, replicas are capped at the partition count unless allowIdleConsumers: "true".
triggers:
- type: kafka
metadata:
bootstrapServers: kafka-broker-1.kafka:9092,kafka-broker-2.kafka:9092
consumerGroup: order-consumer-group
topic: orders
lagThreshold: "50" # 1 replica per 50 lag
activationLagThreshold: "1" # activate from zero on any lag
offsetResetPolicy: latest
authenticationRef:
name: kafka-trigger-auth # TriggerAuthentication with SASL credentials
Why activationLagThreshold matters: without it (default 0), KEDA activates from zero on the very first message. If your topic receives sporadic single-event bursts that do not justify cold-starting a consumer, set this higher.
Prometheus: scale on any PromQL query
The Prometheus scaler lets you scale on anything Prometheus can measure: HTTP request rate, p95 latency, custom business metrics, queue depth exposed via exporters.
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-operated.monitoring.svc:9090
query: sum(rate(http_requests_total{deployment="api-gateway"}[2m]))
threshold: "100" # 1 replica per 100 req/s
activationThreshold: "5" # activate from zero above 5 req/s
ignoreNullValues: "true" # don't error on empty Prometheus response
The query must return a single scalar or a single-element vector. Multi-element results cause the scaler to error. Test your PromQL in the Prometheus UI first.
Cron: guaranteed capacity during scheduled windows
The Cron scaler sets a replica floor during a time window. It pairs well with event scalers: Cron guarantees a warm baseline during business hours while Kafka or Prometheus handle burst scaling on top.
triggers:
- type: cron
metadata:
timezone: Europe/Amsterdam
start: "0 8 * * 1-5" # weekdays 08:00
end: "0 18 * * 1-5" # weekdays 18:00
desiredReplicas: "5" # floor of 5 during business hours
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: order-consumer-group
topic: orders
lagThreshold: "50"
During business hours, KEDA ensures at least 5 replicas. Kafka lag can push beyond 5. Outside the Cron window, the Kafka scaler governs alone, including scaling to zero when the topic is idle.
Scale to zero
This is the feature that most clearly separates KEDA from plain HPA. The mechanism works in two phases.
Activation (0 to 1). When minReplicaCount: 0 and current replicas are 0, the KEDA operator (not the HPA) polls the event source every pollingInterval seconds and calls IsActive(). When the metric value strictly exceeds activationThreshold, the operator sets the Deployment to 1 replica. The HPA takes over from there.
Deactivation (1 to 0). When all triggers report inactive (metric at or below threshold), the operator starts the cooldownPeriod countdown. After cooldownPeriod seconds of sustained inactivity, KEDA sets replicas to 0 directly, bypassing HPA (which cannot go below minReplicas: 1).
A critical gotcha: activationThreshold has priority over threshold. If you set activationThreshold: 50 and threshold: 10, and 40 messages are in the queue, the scaler stays inactive. The workload will not activate even though the HPA math would call for 4 replicas. This is intentional. It lets you prevent cold-starts on transient spikes. But it bites hard if you set it without understanding this precedence.
Cold-start cost. Pods activated from zero must start up before they can process events. For workloads where cold-start latency is unacceptable, set minReplicaCount: 1 to always keep one warm replica. You lose the cost savings of true zero, but you avoid the startup penalty.
How KEDA and HPA interact
KEDA does not replace HPA. It creates and feeds one.
When you create a ScaledObject, KEDA's operator creates an HPA named keda-hpa-{scaledobject-name}. The HPA's metrics section references external metrics served by KEDA's metrics server. Every 15 seconds (the default --horizontal-pod-autoscaler-sync-period), the HPA controller queries the metrics server and applies its standard formula: desiredReplicas = ceil(currentReplicas * (currentValue / targetValue)).
Do not manually edit the HPA that KEDA creates. Changes are overwritten on the next reconciliation cycle. To configure HPA behaviour policies (stabilization windows, scale-down rate limits), use the horizontalPodAutoscalerConfig.behavior field in the ScaledObject:
horizontalPodAutoscalerConfig:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 30 # remove max 30% per period
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0 # react immediately
One important distinction: cooldownPeriod governs only the 1-to-0 transition. For N-to-1 scale-down behavior, the HPA's scaleDown.stabilizationWindowSeconds is in control. These are separate mechanisms with separate timers.
Gotchas that bite in production
Scale-down kills pods mid-processing. KEDA monitors queue depth. If a message is dequeued but not yet acknowledged, the queue appears shorter. KEDA may trigger scale-down and terminate the pod before it finishes. Mitigations: for RabbitMQ, use excludeUnacknowledged: "true" (HTTP protocol mode). For Kafka, use excludePersistentLag: "true". Always configure terminationGracePeriodSeconds and preStop lifecycle hooks. For long-running batch tasks, use ScaledJob instead of ScaledObject; it creates a Kubernetes Job per event with its own completion lifecycle.
KEDA polling + HPA polling = double latency. KEDA polls every pollingInterval seconds. The HPA controller evaluates every 15 seconds. In the worst case, there is pollingInterval + 15 seconds between an event arriving and scaling starting. Set pollingInterval: 10 or lower for latency-sensitive workloads, but watch the load on your event source.
Admission webhook blocks ScaledObject creation. In clusters with strict network policies, the webhook pod may be unreachable. If kubectl apply hangs for 30 seconds and then fails with a webhook timeout, check that the keda-admission-webhooks pod is running and reachable from the API server.
Default RBAC is broad. KEDA's Helm chart installs permissive RBAC by default (get/list/watch/scale on all resources). For production, restrict using the rbac.scaledRefKinds value in the Helm chart to limit which resource kinds KEDA can target.
What you learned
You installed KEDA from the Helm chart and verified its three components (operator, metrics server, admission webhooks). You created a ScaledObject that links a Deployment to an event source and saw KEDA automatically create an HPA behind the scenes. You configured three scaler types: Kafka for consumer lag, Prometheus for arbitrary PromQL queries, and Cron for scheduled capacity floors. And you understand the two-phase scaling model: KEDA handles the 0-to-1 activation and 1-to-0 deactivation; the HPA handles everything in between.
Where to go next
- Horizontal Pod Autoscaler deep-dive covers the HPA layer that KEDA creates under the hood: the scaling algorithm, stabilization windows, and combining CPU with custom metrics
- Kubernetes Jobs and CronJobs explains the Job resource that KEDA's ScaledJob builds on for batch processing
- Resource requests and limits covers right-sizing the pods that KEDA scales, so you do not hit OOMKills or scheduling failures when replicas surge