OOMKilled: Kubernetes out of memory errors explained

OOMKilled means the Linux kernel terminated your container because it exceeded its memory limit. The container exits with code 137 (SIGKILL), the kubelet restarts it, and without intervention it will keep dying in a loop. This article covers how OOMKilled works at the kernel level, how to distinguish it from node-level OOM and eviction, how to diagnose the root cause, and how to right-size memory limits for JVM, Go, Node.js and Python workloads.

What OOMKilled means

Exit code 137 is the signature. The math: 128 + 9. Signal 9 is SIGKILL, the unblockable, uncatchable termination signal. When a container's memory consumption crosses its cgroup limit, the Linux kernel's OOM killer fires SIGKILL at the process. No grace period, no chance to clean up.

You see it in kubectl describe pod under the terminated container state:

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137
  Started:      Mon, 07 Apr 2026 14:20:01 +0000
  Finished:     Mon, 07 Apr 2026 14:20:01 +0000

The kernel's sequence is straightforward. When memory usage hits memory.max in the container's cgroup, the kernel first tries to reclaim reclaimable memory (page cache, inactive pages). If reclaim fails to bring usage below the limit, the OOM killer selects the most expensive process in the cgroup and sends SIGKILL.

On clusters running Kubernetes 1.28+ with cgroup v2, memory.oom.group is set to 1. That means all processes in the container are killed together, preventing half-killed containers from running in an undefined state.

Container limit OOM vs node OOM vs eviction

Three different scenarios can cause a pod to disappear. They look similar from the outside but have different causes and different fixes.

Container limit exceeded (OOMKilled). A single container crosses its resources.limits.memory boundary. The cgroup OOM killer fires within that container's cgroup only. Other containers in the pod keep running. The kubelet restarts the killed container per the pod's restartPolicy. This is the common case.

Node-level OOM kill. The entire node runs out of memory. The global Linux OOM killer fires (not the cgroup OOM killer), selecting processes node-wide using oom_score_adj values. To tell the difference: on the node, grep -i "Memory cgroup out of memory" /var/log/syslog points to a container-level kill. A generic "Out of memory: Killed process" without "memcg" or "memory cgroup" points to a node-level kill.

Node-pressure eviction. The kubelet proactively terminates pods when the node approaches a memory eviction threshold. This is not the kernel acting; it is the kubelet. The pod status shows Failed with reason Evicted, not OOMKilled. The entire pod is removed, not just one container. The kubelet also evicts pods under disk pressure, which is independent from memory pressure: a pod can have plenty of memory headroom and still be evicted because the node ran out of disk. For the full eviction signal set, the three-step pod-selection ranking, and how to clean up Evicted pods, see Kubernetes pod eviction: node pressure, disk pressure, and Evicted status. For how disk-pressure eviction works and what the kubelet measures, see Kubernetes ephemeral storage: limits, eviction, and container disk management.

The distinction matters because the fix is different. Container-level OOMKill means your limit is too low or your application uses too much memory. Node-level OOM means the node is overcommitted. Eviction means the cluster is running too hot. For a deeper look at how requests and limits interact with these scenarios, see the resource requests and limits article.

QoS classes and kill priority

Under node-level memory pressure, the kernel uses oom_score_adj to decide which pod to kill first. Kubernetes assigns QoS classes based on how requests and limits are configured:

QoS class	Criteria	oom_score_adj	Kill priority
Guaranteed	All containers: request == limit (non-zero)	-997	Last (most protected)
Burstable	At least one container has a request or limit	2 to 999	Middle
BestEffort	No requests or limits on any container	1000	First

A container-level OOM kill (hitting its own limit) fires regardless of QoS class. QoS only affects who gets killed first during node-level pressure.

Diagnosing OOMKilled

Start with kubectl describe pod. The output tells you whether the previous container termination was OOMKilled and gives you the exit code and timestamps.

kubectl describe pod my-app-7d4f9b8c6-xvk2p -n production

Look for the Last State: Terminated block with Reason: OOMKilled and Exit Code: 137.

Next, check events for the namespace:

kubectl get events --sort-by='.lastTimestamp' -n production

Events with reason OOMKilling or container termination messages confirm the kill and often include timestamps that help correlate with traffic spikes or deployments.

Check current memory usage if the pod has restarted and is still running (requires metrics-server):

kubectl top pod my-app-7d4f9b8c6-xvk2p -n production
# NAME                          CPU(cores)   MEMORY(bytes)
# my-app-7d4f9b8c6-xvk2p       245m         412Mi

For pre-kill application logs (the previous container's output):

kubectl logs my-app-7d4f9b8c6-xvk2p -n production --previous

For live inspection of the cgroup memory counter inside the container:

kubectl exec -it my-app-7d4f9b8c6-xvk2p -n production -- cat /sys/fs/cgroup/memory.current

Prometheus metrics for memory monitoring

Two cAdvisor metrics matter for OOM diagnosis:

container_memory_working_set_bytes is active memory that cannot be easily reclaimed (usage minus inactive file-backed pages). This is what Kubernetes compares against the memory limit. It is what kubectl top reports. If this number approaches the limit, an OOM kill is imminent.

container_memory_rss is the Resident Set Size: anonymous memory plus swap cache, excluding page cache. High RSS with low page cache means most memory is anonymous allocations that the kernel cannot reclaim, a high-risk OOM profile.

Memory usage as a percentage of the limit:

sum(container_memory_working_set_bytes{container!=""}) by (pod, namespace)
/
sum(kube_pod_container_resource_limits{resource="memory"}) by (pod, namespace)

Pods consuming more than 80% of their limit (early warning):

sum(container_memory_working_set_bytes{container!=""}) by (pod)
/
sum(kube_pod_container_resource_limits{resource="memory"}) by (pod) > 0.8

Alert rule for OOMKilled events (requires kube-state-metrics):

- alert: PodOOMKilled
  expr: kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} > 0
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: "Container  in pod  was OOMKilled"

Right-sizing memory limits

The goal is a limit high enough that legitimate workloads survive, but low enough to provide isolation and catch runaway processes. Collect real data first.

Run the workload under realistic load for 7 to 30 days. Record container_memory_working_set_bytes at p50, p95 and p99 using Prometheus.
Set resources.requests.memory near the p95 of the working set. This gives the scheduler accurate placement data.
Set resources.limits.memory to the p99 of the working set plus 20 to 30% headroom. This absorbs legitimate spikes while capping runaway usage.
For critical production workloads, consider setting request equal to limit (Guaranteed QoS). This prevents eviction under node pressure, at the cost of OOMKill for any spike above the limit.

# Example: p95 = 420 MiB, p99 = 512 MiB
resources:
  requests:
    memory: "512Mi"    # near p95 for scheduler accuracy
  limits:
    memory: "640Mi"    # p99 + ~25% headroom

Memory leak vs insufficient limit

Before increasing the limit, figure out whether the application actually needs more memory.

Insufficient limit: memory usage is stable but consistently near the limit. The fix is to increase the limit.
Memory leak: memory usage grows monotonically over time without plateauing. Increasing the limit delays the inevitable. Profile the application and fix the leak.
Traffic spike: memory usage spikes transiently during high load. Increase the limit with sufficient headroom, or scale replicas with HPA before memory pressure builds.

Look at Prometheus history. If container_memory_working_set_bytes shows an upward-only trend with no plateau, it is a leak. A sawtooth pattern (growth, GC drop, growth) at increasing amplitudes is also a leak indicator.

What happens without limits

Containers without memory limits get BestEffort QoS. They are the first to be killed under node pressure, they can consume all node memory and take down co-located pods, and they provide no OOM isolation. Always set memory limits in production.

Language-specific memory behaviour

JVM (Java, Kotlin, Scala)

The most common cause of JVM OOMKilled: setting the container limit equal to -Xmx. The JVM uses far more memory than just the heap.

Component	Flag	Typical size
Heap	`-Xmx`	Explicitly set
Metaspace	`-XX:MaxMetaspaceSize`	250 to 500 MB (grows with class loading)
Code cache	`-XX:ReservedCodeCacheSize`	240 MB default
Thread stacks	`-Xss` (per thread)	~1 MiB per thread
Direct buffers	`-XX:MaxDirectMemorySize`	Varies
JVM overhead	(no flag)	~100 to 300 MB

A rough formula for the container limit: heap + metaspace + code cache + (threads x stack size) + 300 MB overhead. For a 600 MB heap, 250 MB metaspace, 50 MB code cache, 100 threads at 1 MB each, and 300 MB overhead, that is roughly 1,300 MB for a 600 MB heap.

Instead of hardcoding -Xmx, use -XX:MaxRAMPercentage=75.0 (JDK 8u191+ and JDK 11+) to set the heap as a percentage of the container's visible memory. On JDK 11+, the JVM reads cgroup limits automatically. 75% is a common starting point, leaving 25% for non-heap components.

Cap the non-heap regions explicitly to prevent surprises:

-XX:MaxMetaspaceSize=256m
-XX:ReservedCodeCacheSize=128m
-XX:MaxDirectMemorySize=128m

One important distinction: OOMKilled (exit code 137) is the Linux kernel killing the container because total memory exceeded the cgroup limit. A Java OutOfMemoryError is the JVM throwing an exception because the heap GC cannot reclaim enough space. The container may survive the Java OOM exception if the application catches it. They are different failures with different diagnostics.

Go

Before Go 1.19, Go applications frequently got OOMKilled in containers because the garbage collector had no awareness of the container memory ceiling. The GC uses a ratio-based trigger (GOGC=100): it runs when the heap doubles from the last collection. With a 500 MB live heap after the last GC, the next collection fires at 1 GB, which might already be past the container limit.

GOMEMLIMIT (Go 1.19+) is a soft memory limit for the Go runtime. When total Go memory usage approaches this value, the GC becomes more aggressive. It is "soft" because Go does not guarantee staying below it, but it prevents the scenario where GC fires too late.

env:
- name: GOMEMLIMIT
  value: "1843MiB"    # ~90% of a 2Gi container limit
- name: GOGC
  value: "off"        # disable ratio-based GC; let GOMEMLIMIT drive collection

Set GOMEMLIMIT to 90 to 95% of the container memory limit. The remaining 5 to 10% gives the kernel headroom for page cache and cgroup accounting. Benchmarks from Ardan Labs show that GOMEMLIMIT can actually improve throughput by keeping the GC on a tighter, more predictable cycle.

Node.js

Node.js has been container-aware since v12. Without flags, V8 uses approximately 50% of the container's visible memory for the JavaScript heap, capped at around 2 GiB. For small containers (512 Mi or less), this default leaves very little headroom.

Override with --max-old-space-size (in MB):

env:
- name: NODE_OPTIONS
  value: "--max-old-space-size=512"    # for a 1 Gi container: ~50% for heap

Set --max-old-space-size to 50 to 70% of the container memory limit. Never set it equal to the limit. Node.js needs memory outside the V8 heap for native C++ add-ons, Buffer allocations (which live outside V8), and OS overhead.

The Buffer pitfall is worth calling out: applications that do heavy streaming or image processing allocate large Buffer objects that are tracked outside V8's heap accounting. You can have --max-old-space-size set correctly and still OOMKill because Buffer memory pushed total container usage over the limit.

Python

Python has no container awareness. CPython does not read cgroup limits and has no equivalent of MaxRAMPercentage or GOMEMLIMIT. It will grow its heap until the cgroup limit kills it.

Common causes of Python OOMKilled in containers:

Long-running async services (aiohttp, FastAPI with asyncio) accumulating memory from unclosed connections and event loop references.
Loading large datasets into memory (pandas DataFrames, numpy arrays) without chunking.
C extension libraries (numpy, scipy, Pillow) allocating memory via malloc outside Python's heap, invisible to Python's GC.

For profiling, tracemalloc (stdlib) tracks Python heap allocations. For production environments, memray (by Bloomberg) is more comprehensive because it tracks both Python heap and C-level allocations.

VPA for automated right-sizing

The Vertical Pod Autoscaler (VPA) automatically adjusts resource requests and limits based on historical usage. It is a separate CRD-based installation that requires metrics-server.

VPA consists of three components. The Recommender continuously analyzes historical resource usage, including OOM events, and calculates target, lower-bound and upper-bound memory recommendations. The Updater compares current pod resources against recommendations and evicts pods when the difference exceeds a threshold (or uses in-place update on Kubernetes 1.27+). The Admission Controller intercepts pod creation and injects the recommended values so new pods start right-sized.

Start with updateMode: Off to see recommendations without any automatic changes:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        memory: "128Mi"
      maxAllowed:
        memory: "4Gi"
      controlledValues: RequestsAndLimits

Check what VPA recommends:

kubectl describe vpa my-app-vpa
# Recommendation:
#   Container Recommendations:
#     Container Name:  my-app
#     Lower Bound:
#       Memory:  312Mi
#     Target:
#       Memory:  420Mi
#     Upper Bound:
#       Memory:  1Gi

The VPA Recommender actively factors OOMKilled events into its calculations, typically bumping the upper bound after an OOM event. This makes VPA both a right-sizing tool and an automatic response mechanism.

Mode	Behaviour	Use case
`Off`	Calculates recommendations only	Auditing, first deployment
`Initial`	Applies recommendations at pod creation only	Safe starting point
`Recreate`	Evicts pods when recommendations diverge significantly	Active right-sizing
`InPlaceOrRecreate`	In-place update first, evict as fallback (Kubernetes 1.27+ beta)	Minimal disruption

VPA limitations to be aware of: it cannot share the same resource (memory) with HPA on the same workload. Recreate mode causes pod disruption, so use PodDisruptionBudgets to control the blast radius. Recommendations need at least a few hours to days of historical data before they stabilize.

When to escalate

If you have worked through the diagnosis steps and the OOMKilled events persist, collect this information before asking for help:

Output of kubectl describe pod <pod-name> -n <namespace> (full output, not just the status)
Output of kubectl top pod and kubectl top node at the time of the kill
Prometheus graph of container_memory_working_set_bytes for the pod over the last 24 hours
Application framework and version (JVM version, Go version, Node.js version, Python version)
Any memory-related flags in the container spec or entrypoint (-Xmx, GOMEMLIMIT, --max-old-space-size)
The pod's resource requests and limits YAML
Whether the pod runs a single process or spawns subprocesses

How to prevent recurrence

Set memory limits on every production container. BestEffort QoS is a recipe for unpredictable kills.
Use language-specific memory flags (MaxRAMPercentage, GOMEMLIMIT, --max-old-space-size) to keep application memory well below the container limit.
Deploy VPA in Off mode and review its recommendations as part of regular capacity reviews.
Set up a Prometheus alert on container_memory_working_set_bytes crossing 80% of the limit. Catching the trend before it hits 100% is the difference between a config change and an incident.
If a CrashLoopBackOff is caused by repeated OOMKills, the fix is here (memory limits), not in the restart policy.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy