Kubernetes ResourceQuota and LimitRange: enforce namespace resource limits

ResourceQuota caps total resource consumption across an entire namespace. LimitRange constrains individual pods and containers and injects defaults when none are specified. Get the order of admission wrong, and the first symptom is a pod rejected with must specify limits.memory. This guide walks through applying both objects safely, fixing that error, and covers every spec field as a reference at the end.

Why namespaces need resource limits
ResourceQuota vs. LimitRange: what each controls
Apply a LimitRange before a ResourceQuota
Create a ResourceQuota for compute, storage, and object counts
The "must specify limits.memory" error: cause and fix
Inspect current quota usage with kubectl describe
Quota templates: production vs staging
How HPA interacts with ResourceQuota
ResourceQuota scopes
What ResourceQuota and LimitRange do not do
Reference: every ResourceQuota field
Reference: every LimitRange field

Why namespaces need resource limits

A Kubernetes namespace is a label, not a fence. Without explicit limits, any pod in any namespace can request as much CPU and memory as the cluster has free, claim every available IP, fill etcd with ConfigMaps, or push a node into MemoryPressure that evicts unrelated workloads. ResourceQuota and LimitRange are the two admission-time controls that turn a namespace from a label into a real resource boundary.

Both are admission controllers, both are enabled by default on every standards-compliant cluster (the LimitRanger and ResourceQuota plugins appear in the default plugins list as of Kubernetes 1.36), and both do nothing until you actually create the corresponding objects in a namespace. The presence of the admission plugin is the engine; a ResourceQuota or LimitRange object is the fuel.

This article covers both: how to apply them safely, the error every team hits the first time they roll out a quota, and a complete reference for every spec field.

ResourceQuota vs. LimitRange: what each controls

These two objects look superficially similar (both are namespace-scoped, both constrain resource usage) but they enforce different things at different points in the admission pipeline.

Aspect	ResourceQuota	LimitRange
Scope	Aggregate across all pods in the namespace	Per-pod or per-container
What it does	Caps total CPU, memory, storage, and object counts	Sets default requests/limits and per-container floors and ceilings
When it acts	Validates at admission; rejects requests that would exceed the cap	Mutates at admission (injects defaults), then validates against min/max
Admission plugin	`ResourceQuota` (validating)	`LimitRanger` (mutating + validating)
Failure mode	Pod creation rejected with HTTP 403 Forbidden	Pod creation rejected if outside min/max bounds
Typical units	`requests.cpu: "10"`, `pods: "100"`	`default.cpu: 500m`, `max.memory: 8Gi`
Stops a single greedy pod	No, only the aggregate	Yes, via `max`
Stops the whole tenant	Yes, via `hard` totals	No, individual pods only

You almost always want both. ResourceQuota stops a tenant collectively from claiming the cluster. LimitRange stops one container in that tenant from claiming the entire quota in a single pod, and provides defaults so developers do not have to specify requests and limits on every workload.

Apply a LimitRange before a ResourceQuota

The order matters. As soon as a ResourceQuota tracks cpu or memory, the admission controller rejects any new pod that does not specify the corresponding requests or limits. LimitRange defaults are injected before that check runs, so applying the LimitRange first prevents existing workloads from breaking the moment the quota lands.

# limitrange.yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-payments
spec:
  limits:
  - type: Container
    default:               # injected as limits when the container omits them
      cpu: 500m
      memory: 512Mi
    defaultRequest:        # injected as requests when the container omits them
      cpu: 100m
      memory: 128Mi
    max:                   # ceiling per container
      cpu: "4"
      memory: 8Gi
    min:                   # floor per container
      cpu: 50m
      memory: 64Mi
    maxLimitRequestRatio:  # cap the ratio of limit/request
      cpu: "10"            # limit can be at most 10x the request

kubectl apply -f limitrange.yaml

Two operational notes. First, keep one LimitRange per namespace. Multiple LimitRange objects produce non-deterministic default injection because the admission plugin iterates through them and the last match wins for a given resource. Second, if default and defaultRequest are identical, every pod that omits resource specs gets Guaranteed QoS, which affects eviction priority. For Burstable QoS as the default (the more common choice for general workloads), set defaultRequest lower than default. The mechanics of QoS classes are covered in Kubernetes resource requests and limits.

Create a ResourceQuota for compute, storage, and object counts

Once the LimitRange is in place, apply the ResourceQuota. A complete production-style quota covers four families: compute (cpu, memory), storage (PVC count and total size), object counts (pods, services, secrets), and optionally object counts for arbitrary API resources via the generic count/<resource>.<group> syntax.

# resourcequota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-payments-quota
  namespace: team-payments
spec:
  hard:
    # Compute
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    # Object counts (built-in shortcuts)
    pods: "100"
    services: "10"
    services.loadbalancers: "2"
    secrets: "30"
    configmaps: "30"
    persistentvolumeclaims: "5"
    # Storage
    requests.storage: 100Gi
    fast-ssd.storageclass.storage.k8s.io/requests.storage: 50Gi
    fast-ssd.storageclass.storage.k8s.io/persistentvolumeclaims: "3"
    # Generic count/* for any API resource
    count/deployments.apps: "20"
    count/jobs.batch: "10"
    count/ingresses.networking.k8s.io: "5"

kubectl apply -f resourcequota.yaml

The count/<resource>.<group> syntax has been available since Kubernetes 1.9 and works for any API resource, including third-party CRDs (you can quota count/argocd-applications.argoproj.io if Argo CD is installed). It is the right choice when you want to cap a resource type the built-in shortcuts do not cover.

Quota for extended resources (typically GPUs) was added in Kubernetes 1.10 and uses the form requests.nvidia.com/gpu: "4". Only the requests. prefix is allowed for extended resources; overcommit through limits is not.

The "must specify limits.memory" error: cause and fix

This is the error every team hits the first time they apply a ResourceQuota:

Error from server (Forbidden): error when creating "deployment.yaml":
pods is forbidden: failed quota: team-payments-quota:
must specify limits.memory,requests.memory

What it means. The namespace has a ResourceQuota that tracks requests.memory or limits.memory. The pod the deployment is trying to create does not specify those fields, and there is no LimitRange in the namespace to inject defaults. The admission controller refuses to admit a pod whose memory consumption it cannot account against the quota.

Three fixes, in order of preference.

1. Add a LimitRange that injects defaults. This is the right fix for most teams. The LimitRange in the previous section defines default and defaultRequest, which the admission plugin injects into any pod that omits them. Pods authored without resource specs become valid against the quota automatically. Applying the LimitRange does not retroactively fix already-failed pods; you must re-roll the deployment after the LimitRange is in place.

2. Add explicit requests and limits to the pod spec. Update the deployment manifest:

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

This is the approach when you want every workload's resource profile committed in the manifest, with no implicit defaults from a LimitRange. It produces the most reviewable manifests, but every developer in the namespace has to know to do it.

3. Drop limits.memory from the quota. If you only want to cap aggregate request totals (not limits), remove limits.cpu and limits.memory from the ResourceQuota spec. The admission plugin only requires pods to specify the resource fields the quota actually tracks. This is appropriate for namespaces where limits intentionally vary widely and you do not want a hard ceiling on cumulative limits.

You will know it worked when kubectl apply -f deployment.yaml returns successfully and kubectl get pods -n team-payments shows the new pods running.

A subtle behavior catches people off guard. Creating a Deployment that violates quota does not fail the Deployment object itself. The Deployment is created successfully. Only the Pod creation inside it fails. kubectl get deployment shows a healthy-looking resource with zero ready replicas; kubectl describe deployment <name> or kubectl get events -n <namespace> --field-selector reason=FailedCreate reveals the 403 error. Always check events when a deployment looks healthy but never produces ready pods.

Inspect current quota usage with kubectl describe

kubectl describe resourcequota is the operational workhorse for quota debugging. It shows current usage against every tracked dimension:

kubectl describe resourcequota team-payments-quota -n team-payments

Expected output:

Name:                                      team-payments-quota
Namespace:                                 team-payments
Resource                                   Used   Hard
--------                                   ----   ----
configmaps                                 8      30
count/deployments.apps                     5      20
count/ingresses.networking.k8s.io          1      5
count/jobs.batch                           0      10
limits.cpu                                 4500m  20
limits.memory                              9Gi    40Gi
persistentvolumeclaims                     2      5
pods                                       12     100
requests.cpu                               2100m  10
requests.memory                            4Gi    20Gi
requests.storage                           20Gi   100Gi
secrets                                    14     30
services                                   3      10
services.loadbalancers                     1      2

Two things to read off this output. First, the Used column is what matters operationally. Anything above 80% of Hard is a leading indicator that the next deployment will fail. Second, quota is not retroactive: if pods existed before the quota was applied, their resource usage shows up under Used immediately, but they are not evicted to make room.

For monitoring at scale, install kube-state-metrics and alert on the kube_resourcequota{type="used"} / kube_resourcequota{type="hard"} ratio crossing 0.85. The Prometheus monitoring guide covers kube-state-metrics installation in detail.

Quota templates: production vs staging

Two quota templates I keep in version control and adapt per tenant. These are starting points, not absolutes; observe usage for a sprint or two and tune.

Production tenant (small to mid-size service team)

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: team-payments
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    pods: "200"
    persistentvolumeclaims: "10"
    requests.storage: 200Gi
    services: "20"
    services.loadbalancers: "3"
    secrets: "50"
    configmaps: "50"
    count/deployments.apps: "30"

Compute headroom is set at roughly 30% above expected steady-state to leave room for rolling updates without quota exhaustion. Service and LoadBalancer counts are tight to prevent accidental cost (each LoadBalancer maps to a cloud load balancer with its own monthly cost).

Staging tenant (smaller, throwaway-friendly)

apiVersion: v1
kind: ResourceQuota
metadata:
  name: staging-quota
  namespace: team-payments-staging
spec:
  hard:
    requests.cpu: "5"
    requests.memory: 10Gi
    limits.cpu: "10"
    limits.memory: 20Gi
    pods: "50"
    persistentvolumeclaims: "5"
    requests.storage: 50Gi
    services: "10"
    services.loadbalancers: "0"   # no cloud load balancers in staging
    secrets: "30"

Staging tightens compute, denies LoadBalancers entirely (use port-forwarding or an Ingress for staging access), and assumes ephemeral workloads. The lower pod cap also forces developers to clean up their abandoned namespaces, which is the point.

A multi-tenant cluster usually wants both templates parameterized through a Helm chart or Kustomize overlay so that tenant onboarding is a single PR.

How HPA interacts with ResourceQuota

HPA does not bypass quota. The Horizontal Pod Autoscaler scales by updating the spec.replicas field on a Deployment or other workload controller. The controller then attempts to create new pods, and those pod-creation attempts go through the same admission pipeline as every other pod. If the new replicas would push usage above quota, pod creation fails with HTTP 403 and HPA effectively stops scaling, even though the metrics tell it to scale up.

The visible symptom: HPA reports desiredReplicas: 12 in kubectl describe hpa <name>, but kubectl get deployment <name> shows READY 8/12. Events on the deployment or replicaset reveal FailedCreate with a quota error.

Three approaches to handle this:

Size the quota for HPA's max replicas. If HPA can scale to 20 replicas of 500m CPU each, the quota must accommodate at least 10 CPU on top of any other workloads. This is the correct approach for namespaces where HPA is the primary scaling mechanism.
Set HPA's maxReplicas to fit within quota. Cap the autoscaler so it never tries to exceed the quota in the first place. The trade-off: traffic above what the cap can serve gets dropped or queued.
Decouple staging from production quotas. Keep tight quotas on staging where bursting is not needed, and generous quotas on production where HPA earns its keep. The prod and staging templates above implement this implicitly.

Whatever you choose, set up the alert on used/hard > 0.85 from the previous section. HPA-driven quota exhaustion is silent at the Deployment level until you read events.

ResourceQuota scopes

By default, a ResourceQuota tracks every pod in the namespace. Scopes let you create separate quotas that only count specific subsets of pods, which is the right tool when you want different ceilings for batch jobs versus services, or for high-priority versus low-priority workloads.

Six scopes are available, with different stability levels:

Scope	What it matches	Stability
`BestEffort`	Pods with no requests or limits set on any container	Core (always available)
`NotBestEffort`	Pods with at least one request or limit set	Core
`Terminating`	Pods with `.spec.activeDeadlineSeconds >= 0` (typically Jobs)	Core
`NotTerminating`	Pods with `.spec.activeDeadlineSeconds` unset (services)	Core
`PriorityClass`	Pods that reference a specific PriorityClass	Stable since Kubernetes 1.17
`CrossNamespacePodAffinity`	Pods using cross-namespace pod affinity terms	Stable since Kubernetes 1.24

Scope examples:

# Cap batch jobs (Terminating) without touching service quotas
apiVersion: v1
kind: ResourceQuota
metadata:
  name: batch-jobs-quota
  namespace: team-payments
spec:
  hard:
    requests.cpu: "5"
    requests.memory: 10Gi
    pods: "20"
  scopes:
  - Terminating

# Reserve capacity for high-priority pods only
apiVersion: v1
kind: ResourceQuota
metadata:
  name: high-priority-quota
  namespace: team-payments
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
  scopeSelector:
    matchExpressions:
    - operator: In
      scopeName: PriorityClass
      values:
      - high-priority

# Block cross-namespace pod affinity in a tenant namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: deny-cross-ns-affinity
  namespace: team-payments
spec:
  hard:
    pods: "0"
  scopeSelector:
    matchExpressions:
    - operator: Exists
      scopeName: CrossNamespacePodAffinity

The last example is a deliberate denial pattern: setting pods: "0" on the CrossNamespacePodAffinity scope rejects any pod that tries to use a cross-namespace affinity term, which is the recommended way to prevent tenants from blocking failure domains for other tenants.

Scopes can be combined. A quota that applies to non-best-effort, non-terminating, high-priority pods only is a valid construction, though anything beyond two scopes tends to become hard to reason about.

What ResourceQuota and LimitRange do not do

This section exists because every common misconception about these objects produces an outage waiting to happen.

They are not the same thing. A ResourceQuota without a LimitRange is functional but breaks any pod that omits resource specs. A LimitRange without a ResourceQuota injects defaults but does not cap aggregate consumption. They serve different purposes and are designed to be used together.

No LimitRange does not mean pods without limits are fine. As soon as a ResourceQuota tracks cpu or memory, pods without the corresponding fields fail admission. The must specify limits.memory error is the visible result. LimitRange is the tool that makes pods without explicit specs valid; it is not optional once a ResourceQuota exists.

ResourceQuota does not protect nodes from overcommit. It accounts for requests and limits at admission time but enforces nothing at runtime. A namespace with limits.memory: 100Gi quota on a cluster with 50Gi total memory will admit pods up to the 100Gi limit. If those pods all start using memory simultaneously, the kernel runs out of memory before the kubelet can act. Node-level protection comes from eviction thresholds and the kubelet's reserved-memory configuration, not from quotas.

CPU throttling happens at limits, not requests. ResourceQuota's requests.cpu is a scheduling-time accounting field. Actual CPU throttling, when a container exceeds its allocation, is enforced by the Linux kernel based on the per-container limits.cpu value. A container with requests.cpu: 100m and no limit will burst freely. Quota does not prevent that; only the per-container limit does.

LimitRange max is not a quota. It caps individual containers (no single container in this namespace can request more than 4 CPUs) but says nothing about the sum across the namespace. A namespace with max.cpu: 4 and no quota can run a thousand pods each at 4 CPU.

Quota is not retroactive. Existing pods are not evicted when a quota is applied that they would violate. Their usage shows up under Used immediately, but they continue running. Only new pod creation goes through the quota admission check. To enforce a tighter quota on existing workloads, restart their controllers (kubectl rollout restart deployment <name>) and let the new pods either fit or fail.

Reference: every ResourceQuota field

This section catalogs every field a ResourceQuota object can carry. The schema covers Kubernetes API version v1, which has been stable since Kubernetes 1.0; new resource types and scopes are added inside the existing schema rather than through breaking API changes.

`spec.hard`

A map of named resources to quantities. Every resource the quota tracks goes here.

Compute resources:

Field	Description
`requests.cpu` (or `cpu`)	Sum of `requests.cpu` across all non-terminal pods cannot exceed this value.
`requests.memory` (or `memory`)	Sum of `requests.memory` across all non-terminal pods cannot exceed this value.
`limits.cpu`	Sum of `limits.cpu` across all non-terminal pods.
`limits.memory`	Sum of `limits.memory` across all non-terminal pods.
`hugepages-<size>`	Sum of huge page requests of the given size (e.g. `hugepages-2Mi`).
`requests.ephemeral-storage`	Sum of ephemeral-storage requests across all pods.
`limits.ephemeral-storage`	Sum of ephemeral-storage limits across all pods.
`requests.<extended-resource>`	Sum of requests for an extended resource (e.g. `requests.nvidia.com/gpu`). Available since Kubernetes 1.10.

Storage resources:

Field	Description
`requests.storage`	Sum of `spec.resources.requests.storage` across all PVCs in the namespace.
`persistentvolumeclaims`	Total number of PVCs that can exist in the namespace.
`<storage-class>.storageclass.storage.k8s.io/requests.storage`	Sum of `requests.storage` for PVCs that reference a specific StorageClass.
`<storage-class>.storageclass.storage.k8s.io/persistentvolumeclaims`	Number of PVCs allowed for a specific StorageClass.

Object counts (built-in shortcuts):

Field	Description
`pods`	Total non-terminal pods.
`services`	Total Services.
`services.loadbalancers`	Services of type `LoadBalancer`.
`services.nodeports`	Services of type `NodePort`.
`secrets`	Total Secrets.
`configmaps`	Total ConfigMaps.
`persistentvolumeclaims`	Total PVCs (also listed under storage).
`replicationcontrollers`	Total ReplicationControllers.
`resourcequotas`	Total ResourceQuotas (rarely tracked).

Object counts (generic syntax, available since Kubernetes 1.9):

Field	Description
`count/<resource>.<group>`	Total objects of the given type. Works for core API resources (`count/pods`), built-in groups (`count/deployments.apps`, `count/jobs.batch`, `count/ingresses.networking.k8s.io`), and CRDs (`count/argocd-applications.argoproj.io`).

`spec.scopes`

An optional list of scope names. The quota only counts objects matching all listed scopes. Valid values:

Scope	Description	Stability
`BestEffort`	Pods with no requests or limits on any container.	Core
`NotBestEffort`	Pods with at least one request or limit set.	Core
`Terminating`	Pods with `spec.activeDeadlineSeconds >= 0`.	Core
`NotTerminating`	Pods with `spec.activeDeadlineSeconds` unset.	Core
`PriorityClass`	Pods referencing a PriorityClass (use with `scopeSelector`).	Stable since 1.17
`CrossNamespacePodAffinity`	Pods with cross-namespace pod affinity terms.	Stable since 1.24

BestEffort and NotBestEffort are mutually exclusive, as are Terminating and NotTerminating. Combining incompatible scopes results in a quota that matches nothing.

`spec.scopeSelector`

A more expressive alternative to spec.scopes. Required when scoping by PriorityClass.

scopeSelector:
  matchExpressions:
  - operator: In | NotIn | Exists | DoesNotExist
    scopeName: PriorityClass | CrossNamespacePodAffinity | <other>
    values:
    - <value>

The operator must be In or NotIn when values is non-empty, and Exists or DoesNotExist when values is empty.

`status.hard` and `status.used`

Read-only fields populated by the ResourceQuota controller. hard mirrors spec.hard; used reports current consumption per tracked resource. These are what kubectl describe resourcequota reads.

Reference: every LimitRange field

LimitRange is also a stable v1 API. Each entry in spec.limits is one constraint block.

`spec.limits[].type`

The kind of object the constraint applies to. Three values are supported:

Type	What it constrains
`Container`	Individual containers (the most common). Accepts `default`, `defaultRequest`, `min`, `max`, `maxLimitRequestRatio`.
`Pod`	Aggregate across all containers in a pod. Accepts `min`, `max`, `maxLimitRequestRatio`. Does not accept `default` or `defaultRequest`.
`PersistentVolumeClaim`	Storage requested by PVCs in the namespace. Accepts `min` and `max` for the `storage` resource.

`spec.limits[].default`

A map of resource names to quantities. Injected as limits into any container that does not specify them. Applies only when type: Container.

default:
  cpu: 500m
  memory: 512Mi

`spec.limits[].defaultRequest`

Same shape as default, but injected as requests. Applies only when type: Container.

defaultRequest:
  cpu: 100m
  memory: 128Mi

If defaultRequest equals default, the resulting pod is Guaranteed QoS. To produce Burstable QoS pods by default, set defaultRequest lower than default.

`spec.limits[].min`

Per-resource floor. A container or PVC requesting less is rejected at admission with a clear error message naming the resource.

min:
  cpu: 50m
  memory: 64Mi
  storage: 1Gi   # only when type: PersistentVolumeClaim

`spec.limits[].max`

Per-resource ceiling. A container or PVC requesting more is rejected.

max:
  cpu: "4"
  memory: 8Gi
  storage: 100Gi   # only when type: PersistentVolumeClaim

`spec.limits[].maxLimitRequestRatio`

A cap on the ratio of limit to request for a given resource, expressed as a quantity. A value of "4" means a container's limit can be at most 4x its request.

maxLimitRequestRatio:
  cpu: "10"
  memory: "4"

This is the field that prevents a developer from setting requests.cpu: 100m and limits.cpu: 8000m, which would otherwise let a single container burst to 8 cores on a node where it was scheduled assuming only 100m. In multi-tenant clusters, capping the ratio at 4 to 10 is a defensible default.

How LimitRange interacts with admission

The LimitRanger admission plugin runs in two phases. First, it mutates pod specs by injecting default and defaultRequest values for containers that omit them. Second, it validates the resulting pod against min, max, and maxLimitRequestRatio. The mutation happens before ResourceQuota's validation, which is why LimitRange and ResourceQuota work together: the LimitRange fills in the gaps before the quota checks for missing fields.

Multiple LimitRange objects in a namespace are technically allowed, but produce non-deterministic injection because the plugin iterates through them and the last match wins per resource. Stick to one LimitRange per namespace.

For the broader namespace-isolation pattern that uses ResourceQuota and LimitRange alongside NetworkPolicy, RBAC, and Pod Security Standards, see the multi-tenancy walkthrough.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy

Kubernetes ResourceQuota and LimitRange: enforce namespace resource limits

Table of contents