Kubernetes Resource Limits and Requests: Setting Them Correctly

What Requests and Limits Actually Do

A resource request is what Kubernetes uses for scheduling. The scheduler finds a node with enough unallocated capacity to satisfy the request and places the pod there. Requests do not constrain runtime usage. The Kubernetes documentation on resource management covers the full behavior of both fields.

A limit is the runtime cap. A pod that exceeds its CPU limit is throttled. A pod that exceeds its memory limit is killed (OOMKilled). These are very different behaviors with very different debugging signatures.

Memory: Always Set Limits, Set Them Equal to Requests

Memory is incompressible. When a node runs out of memory, the kernel kills processes. If you don’t set memory limits, a pod with a memory leak can take down a whole node.

Best practice for production: set memory request equal to memory limit. This puts the pod in the Guaranteed QoS class, which means it’s the last to be evicted under node pressure. Different values create a Burstable pod, which is fine for less critical workloads but worth understanding.

CPU: Set Requests, Be Careful With Limits

CPU is compressible. A pod that’s throttled doesn’t crash; it just runs slower. That’s why CPU limits are more contentious than memory limits.

The argument against CPU limits: throttling has surprising performance impacts, especially on latency-sensitive services. A service that’s throttled at 100ms intervals can see p99 spikes even at low average utilization.

The argument for CPU limits: without them, a single misbehaving pod can monopolize node CPU and degrade neighbors. The practical compromise is to set CPU requests carefully and set limits only when you have specific multi-tenancy or fairness requirements.

How to Determine the Right Values

Profile in production-like load. Look at p95 and p99 CPU and memory consumption over a representative period. Set CPU request to roughly p95 of observed usage; set memory request and limit to p99 plus a safety margin.

Tools like Vertical Pod Autoscaler in recommendation mode can suggest values based on observed usage. They’re a good starting point but rarely a substitute for understanding the workload.

What Goes Wrong

Common failure modes: requests set too high, leading to wasted node capacity; requests set too low, leading to scheduling on overcommitted nodes and CPU starvation under load; memory limits set too low, leading to OOMKills under traffic spikes; no limits at all, leading to noisy-neighbor problems.

Vertical Pod Autoscaler in Practice

VPA in recommendation mode observes pod resource usage over time and suggests requests and limits. It’s most useful as a continuously-updated input to manual sizing decisions, not as an automatic adjuster.

The recommendation output (kubectl describe vpa) shows target, lower bound, and upper bound for both CPU and memory. Target is the recommended setting; lower bound and upper bound describe the range observed under different load conditions. Apply target as the request, set memory limit equal, and recheck monthly.

Multi-Tenancy and the Noisy Neighbor Problem

Without limits, one misbehaving pod can use everything the node has. CPU starvation degrades neighbors invisibly; memory pressure triggers OOMKills. ResourceQuotas at the namespace level cap aggregate consumption across all pods in a namespace, providing a second layer of protection.

For workloads with strict performance SLAs, dedicated node pools (via taints and tolerations or node selectors) provide hard isolation. The tradeoff is bin-packing efficiency — dedicated pools have lower utilization. For shared-tenancy clusters, this is often worth it for the latency predictability.

OOMKill Debugging

A pod killed for exceeding its memory limit leaves a specific signature. Pod events include ‘OOMKilled’ in the termination reason. kubectl describe pod shows the last termination state. Container exit code is 137 (128 + SIGKILL signal number 9).

Root-causing OOMKills requires understanding what the application was doing. Heap dumps right before OOM (via in-app monitoring), memory profiling under load, and tracking memory growth over time all help. The fix is rarely ‘just raise the limit’ — that delays the problem instead of solving it.

Node Resource Pressure Eviction

When a node runs out of resources, kubelet evicts pods to recover. Eviction order goes: BestEffort pods first, then Burstable pods exceeding their requests, then Guaranteed pods last.

This is why QoS class matters. Production-critical workloads should be Guaranteed (requests equal limits, both set). Less critical workloads can be Burstable. BestEffort (no requests or limits) is the eviction-first class — fine for batch work, dangerous for anything user-facing.

Node pressure eviction can cascade: evicted pods reschedule, possibly to nodes that then come under pressure. PodDisruptionBudgets limit the disruption; resource headroom on nodes prevents it.

Container Image Provenance

Knowing where your container images come from is foundational. Pinning by digest (not by tag) gives immutability. Signing with Sigstore or Notary provides authenticity.

Build provenance — recording how the image was built, from what source, by which CI system — adds an additional layer. SLSA attestations capture this in a standardized format.

For organizations subject to executive orders or regulatory frameworks requiring software supply chain controls, provenance becomes mandatory rather than optional. Building the practice into normal CI early is cheaper than retrofitting under audit pressure.

Observability for Kubernetes Workloads

Standard observability for Kubernetes includes: pod metrics (cAdvisor exposed via kubelet), node metrics (node-exporter), API server and controller metrics, and application metrics via service annotations or ServiceMonitor.

The kube-prometheus-stack Helm chart bundles all of this with pre-built dashboards and alerts. Most clusters that want quick observability install it and customize from there. For deeper observability — distributed tracing across pods, application-level instrumentation — OpenTelemetry layers on top.

Logs follow a similar pattern. Fluent Bit or Vector as the agent, shipping to a centralized log store (Loki, Elasticsearch, CloudWatch). Per-pod metadata enrichment makes logs searchable by deployment, namespace, and pod labels.

Capacity Planning and Right-Sizing

Kubernetes capacity planning has two layers: cluster capacity (how many nodes, what types) and workload capacity (resource requests and limits). Both deserve attention.

For cluster capacity, observe peak utilization and plan headroom. 70-80% peak utilization is a healthy target — below that, you’re paying for idle capacity; above that, autoscaling lag and burst patterns can cause issues.

For workload capacity, the right-sizing tools mentioned earlier surface candidates. Schedule quarterly right-sizing reviews. Service growth and traffic pattern changes mean yesterday’s right-size is today’s waste or saturation.

Image Optimization for Production

Beyond best practices in the Dockerfile, image optimization at the repository level pays back across many services. Standardize on a small set of base images, share optimization patterns across teams, and centralize the security-update process for those base images.

Internal base images that wrap upstream images with organization-specific additions (corporate certs, common tools, security agents) reduce per-service complexity. Build them with the same discipline as application images — pinned dependencies, signed, scanned.

Image size impacts pull time, which impacts pod startup, which impacts autoscaling responsiveness and rolling deploy duration. The end-to-end effect is larger than the per-image savings suggest.

Operational Recommendations

For teams running production Kubernetes workloads, a small set of disciplines pays back across nearly every dimension of cluster operation. Define resource requests and limits for all production workloads. Establish a network policy posture that defaults to deny. Run regular cluster upgrades on a defined cadence. Monitor cluster health alongside application health.

These aren’t novel recommendations — they appear in every Kubernetes best-practices guide. They’re rarely fully implemented in production clusters that grew organically. The work of bringing existing clusters to this baseline is significant but worthwhile.

For new clusters, build these in from the start. Templates and operators can enforce the baseline; documentation captures the intent. Each new service onboarded gets the right defaults rather than requiring later remediation.

Operational maturity in Kubernetes is incremental. Pick the next improvement, implement it, move on. The compounded effect over time is what separates well-operated clusters from clusters that work but feel fragile.

Frequently Asked Questions

Should I set CPU limits?

Not by default. Set CPU requests carefully. Add CPU limits only when multi-tenancy or fairness requires them, and be aware of throttling implications.

Should I set memory limits?

Yes. Always. Equal to memory request for production-critical workloads.

What’s the difference between Guaranteed, Burstable, and BestEffort?

Guaranteed: requests equal limits. Burstable: requests set, limits higher or unset. BestEffort: no requests or limits. Eviction order goes BestEffort first.

How do I find pods with bad limits?

Look at pod restart counts and OOMKilled events. VPA recommendations highlight pods consistently using less or more than requested.