Kubernetes cost optimization: rightsizing workloads and reducing cluster spend

Most Kubernetes clusters waste 60–80% of requested resources because teams set requests high and never revisit them. This guide walks through getting cost visibility with Kubecost and kubectl top, generating rightsizing recommendations with Goldilocks and VPA, enforcing sane defaults with LimitRange and ResourceQuota, and combining rightsizing with spot instances and autoscaling to cut cluster spend without sacrificing reliability.

Kubernetes makes it easy to allocate resources and hard to know whether those allocations match reality. A Google Cloud study found that only about 13% of requested CPU in a typical cluster is actually used. The rest is allocated but idle — reserved by the scheduler, consuming node capacity and cloud spend, but doing no useful work.

The root cause is straightforward. Developers set resource requests high because the cost of under-provisioning (OOMKills, throttling, failed deploys) is visible and immediate, while the cost of over-provisioning (wasted cloud spend) is invisible until the monthly bill arrives. This guide works through the practical steps to close that gap.

Step 1: get cost visibility with Kubecost or kubectl top

You cannot rightsize what you cannot see. Before touching any resource specs, establish a baseline of what the cluster actually uses versus what it allocates.

kubectl top: the zero-install starting point

If metrics-server is running (it is by default on EKS, GKE, and AKS), kubectl top shows real-time CPU and memory consumption:

# Per-pod usage in a namespace
kubectl top pods -n production

# Per-node usage across the cluster
kubectl top nodes

Compare the output against what is requested:

# Show requests alongside actual usage
kubectl top pods -n production --containers \
  | sort -k3 -h -r   # sort by CPU usage descending

kubectl top gives you a snapshot. It does not show cost, historical trends, or idle resources at a glance. For that, you need a dedicated tool.

Kubecost: cost allocation and idle resource tracking

Kubecost is the most widely adopted open-source Kubernetes cost monitoring tool. The free tier (OpenCost) provides cost allocation by namespace, deployment, and label. Kubecost 3.1 introduced Resource Quota Rightsizing, which recommends namespace-level quota adjustments based on actual usage.

Install with Helm:

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  -n kubecost --create-namespace \
  --set kubecostToken="YOUR_TOKEN"

After 24–48 hours of data collection, the dashboard shows:

Total cluster cost broken down by namespace, deployment, and pod
Idle cost: the difference between what is allocated and what is used, expressed in dollars
Efficiency score: actual usage / requested resources as a percentage
Rightsizing recommendations: suggested request/limit changes per container

The idle cost number is the single most important metric for a cost optimization conversation. It answers "how much are we paying for resources that do nothing?"

Cloud-provider native tools

Each cloud provider offers built-in cost visibility:

AWS: AWS Cost Explorer with EKS cost allocation tags
GCP: GKE cost allocation in Cloud Billing
Azure: AKS cost analysis in Azure Cost Management

These tools show node-level cost but not pod-level waste. Kubecost fills that gap by mapping cost to individual workloads.

Step 2: generate rightsizing recommendations with Goldilocks

Fairwinds Goldilocks installs the VPA Recommender in recommendation-only mode and adds a web dashboard. It creates a VPA object in Off mode for every Deployment in labeled namespaces and displays the rightsizing recommendations with copy-pasteable resources: YAML.

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks \
  -n goldilocks --create-namespace

# Enable Goldilocks for a namespace
kubectl label namespace production goldilocks.fairwinds.com/enabled=true

Access the dashboard:

kubectl -n goldilocks port-forward svc/goldilocks-dashboard 8080:80

After 7 days of data collection, the dashboard shows per-Deployment recommendations at three percentile levels (lower bound, target, upper bound). The target value is what VPA would apply in enforcement mode — it represents the 90th percentile of observed usage.

The workflow is: install Goldilocks, wait a week, review the dashboard, manually apply the recommendations that make sense, then repeat monthly.

For workloads where you want automated rightsizing instead of a dashboard, configure VPA in enforcement mode.

Step 3: implement namespace resource quotas

ResourceQuota sets a hard ceiling on total resource requests and limits within a namespace. It prevents any single team or environment from consuming an unbounded share of the cluster.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    pods: "100"

Set quotas based on the Kubecost or Goldilocks data from steps 1 and 2. If a namespace currently requests 20 CPU but uses 4, a quota of 8–10 CPU gives the team 2–2.5x their actual usage as headroom while reclaiming the rest for other workloads or for node scale-down.

Check current usage against quotas:

kubectl describe resourcequota production-quota -n production

Step 4: use LimitRange to enforce defaults

A ResourceQuota caps the namespace total but does not prevent a single pod from requesting all of it. LimitRange sets per-container defaults and bounds.

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:           # applied when no limit is specified
      cpu: "500m"
      memory: 512Mi
    defaultRequest:    # applied when no request is specified
      cpu: "100m"
      memory: 128Mi
    max:               # no container can exceed this
      cpu: "4"
      memory: 8Gi
    min:               # no container can go below this
      cpu: "50m"
      memory: 64Mi

LimitRange is especially valuable for development and staging namespaces where developers deploy without explicit resource specs. Without it, pods inherit no requests and no limits (BestEffort QoS), which makes scheduling unpredictable and cost tracking impossible.

Step 5: combine with spot instances and autoscaling

Rightsizing and governance reduce the per-pod cost. The next layer reduces the per-node cost.

Spot instances

Spot and preemptible instances cost 60–80% less than on-demand. After rightsizing, workloads are smaller and more likely to fit on spot instance types. Run stateless, fault-tolerant workloads on spot nodes. Keep stateful workloads (databases, persistent queues) on on-demand nodes.

Cluster Autoscaler and Karpenter

Cluster Autoscaler removes nodes that are underutilized. After rightsizing reduces per-pod allocations, previously full nodes may become half-empty. The autoscaler consolidates pods onto fewer nodes and terminates the extras.

Karpenter (EKS) goes further: it selects the cheapest instance type that fits the pending workload, mixing spot and on-demand, ARM and x86, across multiple instance families. Rightsized pods give Karpenter more flexibility because smaller resource blocks fit on a wider range of instance types.

The compounding effect

These layers compound. Rightsizing a workload from 1 CPU / 2 Gi to 200m / 300 Mi means:

The pod uses 80% less scheduled capacity
The scheduler fits more pods per node
The autoscaler removes underutilized nodes
Karpenter picks smaller, cheaper instance types
Spot eligibility increases because the workload tolerates interruption

A cluster that rightsizes aggressively and runs stateless workloads on spot can reduce its compute bill by 70–85% compared to the same workloads on over-provisioned on-demand nodes.

Establishing a cost review cadence

Cost optimization is not a one-time project. Workloads change, teams deploy new services, and recommendations drift.

Weekly: review Kubecost's idle cost metric. If idle spend rises above a threshold (for example, 40% of total), investigate which namespaces grew.

Monthly: review Goldilocks recommendations. Apply any that have been stable for 4+ weeks. Check that namespace quotas still reflect reality.

Quarterly: audit spot instance coverage. Review Karpenter or Cluster Autoscaler logs for node consolidation opportunities. Revisit instance family diversification.

On every new service deploy: require resource requests in the Deployment spec (enforce via admission webhook or CI policy). Use Goldilocks recommendations from staging as the initial values for production.

What cost optimization is NOT

Not just setting high resource requests to be safe. Over-requesting is the primary driver of waste. The scheduler reserves what you ask for, even if the pod never uses it. Setting requests at 2x actual usage means half your cluster is paid for but idle.
Not something autoscaling alone solves. HPA and Cluster Autoscaler react to demand, but they cannot fix over-provisioned requests. If a pod requests 1 CPU and uses 100m, HPA never scales it down — it is already at 1 replica with 10% utilization. Rightsizing the request is the prerequisite.
Not cloud-provider-specific. Kubecost, Goldilocks, VPA, ResourceQuota, and LimitRange work on any Kubernetes cluster — EKS, GKE, AKS, on-premises, or bare metal. The specific spot instance mechanics vary by provider, but the rightsizing workflow is universal.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy