What you will have at the end
A VPA object that produces right-sizing recommendations for a Deployment's containers. You will know how to read those recommendations, constrain them with minAllowed/maxAllowed bounds, and progress safely from recommendation-only mode to automatic enforcement. You will also understand the VPA + HPA coexistence rules so the two autoscalers do not fight each other.
Prerequisites
kubectlconnected to a Kubernetes 1.28+ cluster- metrics-server installed and returning data (
kubectl top podsshows values, noterror: Metrics API not available) - A Deployment with resource requests defined. VPA reads current requests to calculate the gap between what is allocated and what is actually consumed
- For in-place resize (step 5): Kubernetes 1.33+ with the
InPlacePodVerticalScalingfeature gate enabled (it is enabled by default since 1.33)
Step 1: install VPA
VPA is not part of core Kubernetes. It ships as a CRD plus three controllers maintained in the kubernetes/autoscaler monorepo. You need all three components:
- Recommender: queries metrics-server, builds per-container usage histograms, writes recommendations to
.status.recommendation - Updater: compares running pods against recommendations, evicts or resizes pods when the gap is large enough
- Admission controller: mutating webhook that injects recommended resources into newly created pods
Option A: official install script
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh # deploys CRD + all three controllers to kube-system
Option B: Helm (Fairwinds chart)
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa \
--namespace vpa --create-namespace
Verify the installation
kubectl get pods -n kube-system -l app=vpa # official script
kubectl get pods -n vpa # Helm
Expected output: three pods running (recommender, updater, admission-controller). If any pod is in CrashLoopBackOff, check that metrics-server is reachable.
Step 2: create a VPA in Off mode
Start in Off mode. This is the default updateMode and the only safe starting point. VPA generates recommendations but touches nothing.
# vpa-off.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Off"
kubectl apply -f vpa-off.yaml
The recommender needs real traffic data before its output is meaningful. It uses decaying weighted histograms over an 8-day window. At 24 hours the recommendations carry roughly a 2x safety multiplier; by day 7 that drops to about 1.14x. For workloads with weekly traffic cycles, wait a full week before acting on recommendations.
Step 3: read and interpret recommendations
After at least 24-48 hours of data collection:
kubectl describe vpa web-app-vpa -n production
The status.recommendation.containerRecommendations section shows three tiers per container:
containerRecommendations:
- containerName: web-app
lowerBound: # 50th percentile, minimum viable
cpu: 50m
memory: 128Mi
target: # 90th percentile, what VPA applies in enforcement modes
cpu: 200m
memory: 300Mi
upperBound: # 95th percentile, safety ceiling
cpu: 400m
memory: 500Mi
target is the value VPA applies when you switch to an enforcement mode. Compare it against your current requests: if your Deployment spec says cpu: 1000m and VPA recommends cpu: 200m, you are over-provisioned by 5x. If VPA recommends more than you allocate, your pods are likely getting CPU throttled or OOMKilled.
For structured output you can pipe through jq:
kubectl get vpa web-app-vpa -n production \
-o jsonpath='{.status.recommendation.containerRecommendations}' | jq .
Step 4: add resource policy bounds
Unbounded recommendations can exceed node capacity and leave pods stuck in Pending. The resourcePolicy section constrains what VPA can recommend or apply.
# vpa-bounded.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Off"
resourcePolicy:
containerPolicies:
- containerName: web-app
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: "2" # never recommend more than 2 cores
memory: 4Gi # never recommend more than 4 Gi
controlledValues: RequestsAndLimits # adjusts both, preserving the ratio
- containerName: istio-proxy
mode: "Off" # exclude sidecar from VPA management
Two controlledValues options:
RequestsAndLimits(default): VPA adjusts both requests and limits, preserving the original ratio between them. If your Deployment has a 1:2 request-to-limit ratio, VPA maintains it.RequestsOnly: VPA adjusts requests; limits stay at whatever the Deployment spec says. Safer when you treat limits as a hard safety net.
Step 5: switch to an enforcement mode
Once recommendations have stabilized (7+ days of data, target values look reasonable, bounds are in place), choose an enforcement mode.
Initial
VPA applies recommendations only at pod creation. Running pods are never evicted. Good for StatefulSets or workloads where mid-flight restarts are unacceptable.
updatePolicy:
updateMode: "Initial"
Recreate
VPA's Updater evicts pods whose requests deviate significantly from the recommendation. The workload controller recreates them, and the admission webhook injects updated resources. Causes a brief restart per pod. The Updater respects PodDisruptionBudgets and defaults to --min-replicas=2: it will not evict a Deployment's last remaining pod.
updatePolicy:
updateMode: "Recreate"
InPlaceOrRecreate (Kubernetes 1.33+)
The Updater first attempts an in-place resize using the pod's resize subresource. No restart, no rescheduling. If in-place resize fails (node lacks capacity, memory limit decrease on pre-1.35 cluster), it falls back to eviction.
On VPA 1.3.x you need to pass --feature-gates=InPlaceOrRecreate=true to both the Updater and admission controller. In VPA 1.4.0+ the gate is not required.
updatePolicy:
updateMode: "InPlaceOrRecreate"
In-place resize reached GA/stable in Kubernetes 1.35. Memory limit decreases are now supported on 1.35+, though the Kubelet's OOM prevention check for decreases is best-effort only.
Do not use Auto
As of VPA 1.4.0, Auto is deprecated. It currently behaves identically to Recreate. Use Recreate or InPlaceOrRecreate explicitly.
Verify enforcement is working
After switching to a non-Off mode, check that the admission webhook is registered:
kubectl get mutatingwebhookconfiguration | grep vpa
If the webhook is missing, VPA will evict and recreate pods without applying updated resources. The pods restart for nothing.
VPA + HPA: avoiding the feedback loop
Running VPA and HPA on the same resource metric creates a destructive feedback loop. VPA lowers a pod's CPU request based on per-container histograms. HPA's percentage-based utilization math shifts (same absolute CPU / lower request = higher percentage). HPA scales out more replicas. VPA sees lower per-pod usage and recommends even lower requests. The official VPA documentation explicitly warns against this.
The safe coexistence pattern separates the dimensions:
# VPA: manage memory only
resourcePolicy:
containerPolicies:
- containerName: web-app
controlledResources: ["memory"]
controlledValues: RequestsOnly
minAllowed:
memory: 256Mi
maxAllowed:
memory: 8Gi
HPA keeps managing replica count on CPU (or custom metrics like request rate). VPA handles memory sizing. No overlap, no oscillation.
Goldilocks: a dashboard for recommendation-only workflows
Fairwinds Goldilocks installs only the VPA Recommender (no Updater, no admission controller) and adds a web dashboard. It creates VPA objects in Off mode for every Deployment in labeled namespaces and displays the recommendations with copy-pasteable resources: YAML.
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks \
-n goldilocks --create-namespace
kubectl label namespace production goldilocks.fairwinds.com/enabled=true
kubectl -n goldilocks port-forward svc/goldilocks-dashboard 8080:80
Goldilocks is a good fit when you want cluster-wide right-sizing visibility without any enforcement risk. For the full workflow around turning rightsizing recommendations into cluster cost savings — including namespace quotas, LimitRange defaults, and spot instance integration — see the Kubernetes cost optimization guide.
Known limitations
- Single-replica Deployments: the Updater's
--min-replicas=2default means a Deployment withreplicas: 1will never be evicted inRecreatemode. UseInPlaceOrRecreateon 1.33+ clusters, set--min-replicas=1on the Updater (accepts downtime risk), or increase replicas to 2. - One VPA per workload: multiple VPA objects targeting the same pod produce undefined behavior. Use one VPA per Deployment.
- Recommendations can exceed node size: VPA recommends based on observed usage, not node capacity. Pair with Cluster Autoscaler or Karpenter, or set
maxAllowedto cap recommendations below your largest node's allocatable resources. - Reactive, not predictive: VPA has a 60-90 second observability lag from the metrics pipeline and no seasonality awareness. It cannot prepare for a predictable daily traffic spike.
- No pod-level resources support: VPA cannot work with workloads that define pod-level
resourcesstanzas (a newer Kubernetes feature).
Verify the final result
After completing the setup, confirm everything works end to end:
# 1. VPA object exists and has a recommendation
kubectl get vpa web-app-vpa -n production -o wide
# 2. Recommendations are populated (not empty)
kubectl describe vpa web-app-vpa -n production | grep -A 10 "Container Recommendations"
# 3. If using an enforcement mode: check that pods have VPA-applied resources
kubectl get pods -n production -l app=web-app -o jsonpath='{.items[0].spec.containers[0].resources}'
If Container Recommendations is empty, the recommender has not collected enough data yet. Wait 24-48 hours before evaluating.
When to escalate
If VPA is not producing recommendations, or pods are being evicted but resources do not change, collect the following before asking for help:
kubectl describe vpa <name> -n <namespace>outputkubectl logs -n kube-system deploy/vpa-recommender --tail=50kubectl logs -n kube-system deploy/vpa-admission-controller --tail=50kubectl get mutatingwebhookconfiguration | grep vpaoutput- Kubernetes version (
kubectl version --short) - VPA version (
kubectl get deploy vpa-recommender -n kube-system -o jsonpath='{.spec.template.spec.containers[0].image}')