Kubernetes Vertical Pod Autoscaler (VPA): right-sizing resource requests

The Vertical Pod Autoscaler watches actual CPU and memory consumption per container and adjusts resource requests to match. In Off mode it gives you right-sizing recommendations without touching running pods. In enforcement modes it applies those recommendations automatically, either by restarting pods or (on Kubernetes 1.33+) by resizing them in place. This guide walks through installing VPA, reading its recommendations, bounding them with resource policies, safely progressing to auto-apply, and avoiding the conflict with HPA.

What you will have at the end

A VPA object that produces right-sizing recommendations for a Deployment's containers. You will know how to read those recommendations, constrain them with minAllowed/maxAllowed bounds, and progress safely from recommendation-only mode to automatic enforcement. You will also understand the VPA + HPA coexistence rules so the two autoscalers do not fight each other.

Prerequisites

kubectl connected to a Kubernetes 1.28+ cluster
metrics-server installed and returning data (kubectl top pods shows values, not error: Metrics API not available)
A Deployment with resource requests defined. VPA reads current requests to calculate the gap between what is allocated and what is actually consumed
For in-place resize (step 5): Kubernetes 1.33+ with the InPlacePodVerticalScaling feature gate enabled (it is enabled by default since 1.33)

Step 1: install VPA

VPA is not part of core Kubernetes. It ships as a CRD plus three controllers maintained in the kubernetes/autoscaler monorepo. You need all three components:

Recommender: queries metrics-server, builds per-container usage histograms, writes recommendations to .status.recommendation
Updater: compares running pods against recommendations, evicts or resizes pods when the gap is large enough
Admission controller: mutating webhook that injects recommended resources into newly created pods

Option A: official install script

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh   # deploys CRD + all three controllers to kube-system

Option B: Helm (Fairwinds chart)

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa \
  --namespace vpa --create-namespace

Verify the installation

kubectl get pods -n kube-system -l app=vpa   # official script
kubectl get pods -n vpa                       # Helm

Expected output: three pods running (recommender, updater, admission-controller). If any pod is in CrashLoopBackOff, check that metrics-server is reachable.

Step 2: create a VPA in Off mode

Start in Off mode. This is the default updateMode and the only safe starting point. VPA generates recommendations but touches nothing.

# vpa-off.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Off"

kubectl apply -f vpa-off.yaml

The recommender needs real traffic data before its output is meaningful. It uses decaying weighted histograms over an 8-day window. At 24 hours the recommendations carry roughly a 2x safety multiplier; by day 7 that drops to about 1.14x. For workloads with weekly traffic cycles, wait a full week before acting on recommendations.

Step 3: read and interpret recommendations

After at least 24-48 hours of data collection:

kubectl describe vpa web-app-vpa -n production

The status.recommendation.containerRecommendations section shows three tiers per container:

containerRecommendations:
- containerName: web-app
  lowerBound:       # 50th percentile, minimum viable
    cpu: 50m
    memory: 128Mi
  target:           # 90th percentile, what VPA applies in enforcement modes
    cpu: 200m
    memory: 300Mi
  upperBound:       # 95th percentile, safety ceiling
    cpu: 400m
    memory: 500Mi

target is the value VPA applies when you switch to an enforcement mode. Compare it against your current requests: if your Deployment spec says cpu: 1000m and VPA recommends cpu: 200m, you are over-provisioned by 5x. If VPA recommends more than you allocate, your pods are likely getting CPU throttled or OOMKilled.

For structured output you can pipe through jq:

kubectl get vpa web-app-vpa -n production \
  -o jsonpath='{.status.recommendation.containerRecommendations}' | jq .

Step 4: add resource policy bounds

Unbounded recommendations can exceed node capacity and leave pods stuck in Pending. The resourcePolicy section constrains what VPA can recommend or apply.

# vpa-bounded.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: web-app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: "2"          # never recommend more than 2 cores
        memory: 4Gi       # never recommend more than 4 Gi
      controlledValues: RequestsAndLimits  # adjusts both, preserving the ratio
    - containerName: istio-proxy
      mode: "Off"         # exclude sidecar from VPA management

Two controlledValues options:

RequestsAndLimits (default): VPA adjusts both requests and limits, preserving the original ratio between them. If your Deployment has a 1:2 request-to-limit ratio, VPA maintains it.
RequestsOnly: VPA adjusts requests; limits stay at whatever the Deployment spec says. Safer when you treat limits as a hard safety net.

Step 5: switch to an enforcement mode

Once recommendations have stabilized (7+ days of data, target values look reasonable, bounds are in place), choose an enforcement mode.

`Initial`

VPA applies recommendations only at pod creation. Running pods are never evicted. Good for StatefulSets or workloads where mid-flight restarts are unacceptable.

updatePolicy:
  updateMode: "Initial"

`Recreate`

VPA's Updater evicts pods whose requests deviate significantly from the recommendation. The workload controller recreates them, and the admission webhook injects updated resources. Causes a brief restart per pod. The Updater respects PodDisruptionBudgets and defaults to --min-replicas=2: it will not evict a Deployment's last remaining pod.

updatePolicy:
  updateMode: "Recreate"

`InPlaceOrRecreate` (Kubernetes 1.33+)

The Updater first attempts an in-place resize using the pod's resize subresource. No restart, no rescheduling. If in-place resize fails (node lacks capacity, memory limit decrease on pre-1.35 cluster), it falls back to eviction.

On VPA 1.3.x you need to pass --feature-gates=InPlaceOrRecreate=true to both the Updater and admission controller. In VPA 1.4.0+ the gate is not required.

updatePolicy:
  updateMode: "InPlaceOrRecreate"

In-place resize reached GA/stable in Kubernetes 1.35. Memory limit decreases are now supported on 1.35+, though the Kubelet's OOM prevention check for decreases is best-effort only.

Do not use `Auto`

As of VPA 1.4.0, Auto is deprecated. It currently behaves identically to Recreate. Use Recreate or InPlaceOrRecreate explicitly.

Verify enforcement is working

After switching to a non-Off mode, check that the admission webhook is registered:

kubectl get mutatingwebhookconfiguration | grep vpa

If the webhook is missing, VPA will evict and recreate pods without applying updated resources. The pods restart for nothing.

VPA + HPA: avoiding the feedback loop

Running VPA and HPA on the same resource metric creates a destructive feedback loop. VPA lowers a pod's CPU request based on per-container histograms. HPA's percentage-based utilization math shifts (same absolute CPU / lower request = higher percentage). HPA scales out more replicas. VPA sees lower per-pod usage and recommends even lower requests. The official VPA documentation explicitly warns against this.

The safe coexistence pattern separates the dimensions:

# VPA: manage memory only
resourcePolicy:
  containerPolicies:
  - containerName: web-app
    controlledResources: ["memory"]
    controlledValues: RequestsOnly
    minAllowed:
      memory: 256Mi
    maxAllowed:
      memory: 8Gi

HPA keeps managing replica count on CPU (or custom metrics like request rate). VPA handles memory sizing. No overlap, no oscillation.

Goldilocks: a dashboard for recommendation-only workflows

Fairwinds Goldilocks installs only the VPA Recommender (no Updater, no admission controller) and adds a web dashboard. It creates VPA objects in Off mode for every Deployment in labeled namespaces and displays the recommendations with copy-pasteable resources: YAML.

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks \
  -n goldilocks --create-namespace
kubectl label namespace production goldilocks.fairwinds.com/enabled=true
kubectl -n goldilocks port-forward svc/goldilocks-dashboard 8080:80

Goldilocks is a good fit when you want cluster-wide right-sizing visibility without any enforcement risk. For the full workflow around turning rightsizing recommendations into cluster cost savings — including namespace quotas, LimitRange defaults, and spot instance integration — see the Kubernetes cost optimization guide.

Known limitations

Single-replica Deployments: the Updater's --min-replicas=2 default means a Deployment with replicas: 1 will never be evicted in Recreate mode. Use InPlaceOrRecreate on 1.33+ clusters, set --min-replicas=1 on the Updater (accepts downtime risk), or increase replicas to 2.
One VPA per workload: multiple VPA objects targeting the same pod produce undefined behavior. Use one VPA per Deployment.
Recommendations can exceed node size: VPA recommends based on observed usage, not node capacity. Pair with Cluster Autoscaler or Karpenter, or set maxAllowed to cap recommendations below your largest node's allocatable resources.
Reactive, not predictive: VPA has a 60-90 second observability lag from the metrics pipeline and no seasonality awareness. It cannot prepare for a predictable daily traffic spike.
No pod-level resources support: VPA cannot work with workloads that define pod-level resources stanzas (a newer Kubernetes feature).

Verify the final result

After completing the setup, confirm everything works end to end:

# 1. VPA object exists and has a recommendation
kubectl get vpa web-app-vpa -n production -o wide

# 2. Recommendations are populated (not empty)
kubectl describe vpa web-app-vpa -n production | grep -A 10 "Container Recommendations"

# 3. If using an enforcement mode: check that pods have VPA-applied resources
kubectl get pods -n production -l app=web-app -o jsonpath='{.items[0].spec.containers[0].resources}'

If Container Recommendations is empty, the recommender has not collected enough data yet. Wait 24-48 hours before evaluating.

When to escalate

If VPA is not producing recommendations, or pods are being evicted but resources do not change, collect the following before asking for help:

kubectl describe vpa <name> -n <namespace> output
kubectl logs -n kube-system deploy/vpa-recommender --tail=50
kubectl logs -n kube-system deploy/vpa-admission-controller --tail=50
kubectl get mutatingwebhookconfiguration | grep vpa output
Kubernetes version (kubectl version --short)
VPA version (kubectl get deploy vpa-recommender -n kube-system -o jsonpath='{.spec.template.spec.containers[0].image}')

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy