Kubernetes Vertical Pod Autoscaler (VPA): right-sizing resource requests

The Vertical Pod Autoscaler watches actual CPU and memory consumption per container and adjusts resource requests to match. In Off mode it gives you right-sizing recommendations without touching running pods. In enforcement modes it applies those recommendations automatically, either by restarting pods or (on Kubernetes 1.33+) by resizing them in place. This guide walks through installing VPA, reading its recommendations, bounding them with resource policies, safely progressing to auto-apply, and avoiding the conflict with HPA.

What you will have at the end

A VPA object that produces right-sizing recommendations for a Deployment's containers. You will know how to read those recommendations, constrain them with minAllowed/maxAllowed bounds, and progress safely from recommendation-only mode to automatic enforcement. You will also understand the VPA + HPA coexistence rules so the two autoscalers do not fight each other.

Prerequisites

  • kubectl connected to a Kubernetes 1.28+ cluster
  • metrics-server installed and returning data (kubectl top pods shows values, not error: Metrics API not available)
  • A Deployment with resource requests defined. VPA reads current requests to calculate the gap between what is allocated and what is actually consumed
  • For in-place resize (step 5): Kubernetes 1.33+ with the InPlacePodVerticalScaling feature gate enabled (it is enabled by default since 1.33)

Step 1: install VPA

VPA is not part of core Kubernetes. It ships as a CRD plus three controllers maintained in the kubernetes/autoscaler monorepo. You need all three components:

  • Recommender: queries metrics-server, builds per-container usage histograms, writes recommendations to .status.recommendation
  • Updater: compares running pods against recommendations, evicts or resizes pods when the gap is large enough
  • Admission controller: mutating webhook that injects recommended resources into newly created pods

Option A: official install script

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh   # deploys CRD + all three controllers to kube-system

Option B: Helm (Fairwinds chart)

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa \
  --namespace vpa --create-namespace

Verify the installation

kubectl get pods -n kube-system -l app=vpa   # official script
kubectl get pods -n vpa                       # Helm

Expected output: three pods running (recommender, updater, admission-controller). If any pod is in CrashLoopBackOff, check that metrics-server is reachable.

Step 2: create a VPA in Off mode

Start in Off mode. This is the default updateMode and the only safe starting point. VPA generates recommendations but touches nothing.

# vpa-off.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Off"
kubectl apply -f vpa-off.yaml

The recommender needs real traffic data before its output is meaningful. It uses decaying weighted histograms over an 8-day window. At 24 hours the recommendations carry roughly a 2x safety multiplier; by day 7 that drops to about 1.14x. For workloads with weekly traffic cycles, wait a full week before acting on recommendations.

Step 3: read and interpret recommendations

After at least 24-48 hours of data collection:

kubectl describe vpa web-app-vpa -n production

The status.recommendation.containerRecommendations section shows three tiers per container:

containerRecommendations:
- containerName: web-app
  lowerBound:       # 50th percentile, minimum viable
    cpu: 50m
    memory: 128Mi
  target:           # 90th percentile, what VPA applies in enforcement modes
    cpu: 200m
    memory: 300Mi
  upperBound:       # 95th percentile, safety ceiling
    cpu: 400m
    memory: 500Mi

target is the value VPA applies when you switch to an enforcement mode. Compare it against your current requests: if your Deployment spec says cpu: 1000m and VPA recommends cpu: 200m, you are over-provisioned by 5x. If VPA recommends more than you allocate, your pods are likely getting CPU throttled or OOMKilled.

For structured output you can pipe through jq:

kubectl get vpa web-app-vpa -n production \
  -o jsonpath='{.status.recommendation.containerRecommendations}' | jq .

Step 4: add resource policy bounds

Unbounded recommendations can exceed node capacity and leave pods stuck in Pending. The resourcePolicy section constrains what VPA can recommend or apply.

# vpa-bounded.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: web-app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: "2"          # never recommend more than 2 cores
        memory: 4Gi       # never recommend more than 4 Gi
      controlledValues: RequestsAndLimits  # adjusts both, preserving the ratio
    - containerName: istio-proxy
      mode: "Off"         # exclude sidecar from VPA management

Two controlledValues options:

  • RequestsAndLimits (default): VPA adjusts both requests and limits, preserving the original ratio between them. If your Deployment has a 1:2 request-to-limit ratio, VPA maintains it.
  • RequestsOnly: VPA adjusts requests; limits stay at whatever the Deployment spec says. Safer when you treat limits as a hard safety net.

Step 5: switch to an enforcement mode

Once recommendations have stabilized (7+ days of data, target values look reasonable, bounds are in place), choose an enforcement mode.

Initial

VPA applies recommendations only at pod creation. Running pods are never evicted. Good for StatefulSets or workloads where mid-flight restarts are unacceptable.

updatePolicy:
  updateMode: "Initial"

Recreate

VPA's Updater evicts pods whose requests deviate significantly from the recommendation. The workload controller recreates them, and the admission webhook injects updated resources. Causes a brief restart per pod. The Updater respects PodDisruptionBudgets and defaults to --min-replicas=2: it will not evict a Deployment's last remaining pod.

updatePolicy:
  updateMode: "Recreate"

InPlaceOrRecreate (Kubernetes 1.33+)

The Updater first attempts an in-place resize using the pod's resize subresource. No restart, no rescheduling. If in-place resize fails (node lacks capacity, memory limit decrease on pre-1.35 cluster), it falls back to eviction.

On VPA 1.3.x you need to pass --feature-gates=InPlaceOrRecreate=true to both the Updater and admission controller. In VPA 1.4.0+ the gate is not required.

updatePolicy:
  updateMode: "InPlaceOrRecreate"

In-place resize reached GA/stable in Kubernetes 1.35. Memory limit decreases are now supported on 1.35+, though the Kubelet's OOM prevention check for decreases is best-effort only.

Do not use Auto

As of VPA 1.4.0, Auto is deprecated. It currently behaves identically to Recreate. Use Recreate or InPlaceOrRecreate explicitly.

Verify enforcement is working

After switching to a non-Off mode, check that the admission webhook is registered:

kubectl get mutatingwebhookconfiguration | grep vpa

If the webhook is missing, VPA will evict and recreate pods without applying updated resources. The pods restart for nothing.

VPA + HPA: avoiding the feedback loop

Running VPA and HPA on the same resource metric creates a destructive feedback loop. VPA lowers a pod's CPU request based on per-container histograms. HPA's percentage-based utilization math shifts (same absolute CPU / lower request = higher percentage). HPA scales out more replicas. VPA sees lower per-pod usage and recommends even lower requests. The official VPA documentation explicitly warns against this.

The safe coexistence pattern separates the dimensions:

# VPA: manage memory only
resourcePolicy:
  containerPolicies:
  - containerName: web-app
    controlledResources: ["memory"]
    controlledValues: RequestsOnly
    minAllowed:
      memory: 256Mi
    maxAllowed:
      memory: 8Gi

HPA keeps managing replica count on CPU (or custom metrics like request rate). VPA handles memory sizing. No overlap, no oscillation.

Goldilocks: a dashboard for recommendation-only workflows

Fairwinds Goldilocks installs only the VPA Recommender (no Updater, no admission controller) and adds a web dashboard. It creates VPA objects in Off mode for every Deployment in labeled namespaces and displays the recommendations with copy-pasteable resources: YAML.

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks \
  -n goldilocks --create-namespace
kubectl label namespace production goldilocks.fairwinds.com/enabled=true
kubectl -n goldilocks port-forward svc/goldilocks-dashboard 8080:80

Goldilocks is a good fit when you want cluster-wide right-sizing visibility without any enforcement risk. For the full workflow around turning rightsizing recommendations into cluster cost savings — including namespace quotas, LimitRange defaults, and spot instance integration — see the Kubernetes cost optimization guide.

Known limitations

  • Single-replica Deployments: the Updater's --min-replicas=2 default means a Deployment with replicas: 1 will never be evicted in Recreate mode. Use InPlaceOrRecreate on 1.33+ clusters, set --min-replicas=1 on the Updater (accepts downtime risk), or increase replicas to 2.
  • One VPA per workload: multiple VPA objects targeting the same pod produce undefined behavior. Use one VPA per Deployment.
  • Recommendations can exceed node size: VPA recommends based on observed usage, not node capacity. Pair with Cluster Autoscaler or Karpenter, or set maxAllowed to cap recommendations below your largest node's allocatable resources.
  • Reactive, not predictive: VPA has a 60-90 second observability lag from the metrics pipeline and no seasonality awareness. It cannot prepare for a predictable daily traffic spike.
  • No pod-level resources support: VPA cannot work with workloads that define pod-level resources stanzas (a newer Kubernetes feature).

Verify the final result

After completing the setup, confirm everything works end to end:

# 1. VPA object exists and has a recommendation
kubectl get vpa web-app-vpa -n production -o wide

# 2. Recommendations are populated (not empty)
kubectl describe vpa web-app-vpa -n production | grep -A 10 "Container Recommendations"

# 3. If using an enforcement mode: check that pods have VPA-applied resources
kubectl get pods -n production -l app=web-app -o jsonpath='{.items[0].spec.containers[0].resources}'

If Container Recommendations is empty, the recommender has not collected enough data yet. Wait 24-48 hours before evaluating.

When to escalate

If VPA is not producing recommendations, or pods are being evicted but resources do not change, collect the following before asking for help:

  • kubectl describe vpa <name> -n <namespace> output
  • kubectl logs -n kube-system deploy/vpa-recommender --tail=50
  • kubectl logs -n kube-system deploy/vpa-admission-controller --tail=50
  • kubectl get mutatingwebhookconfiguration | grep vpa output
  • Kubernetes version (kubectl version --short)
  • VPA version (kubectl get deploy vpa-recommender -n kube-system -o jsonpath='{.spec.template.spec.containers[0].image}')

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy

Search this site

Start typing to search, or browse the knowledge base and blog.