Kubernetes & DevOps/senior/freq 5/5

Requests, Limits & QoS Classes

Requests drive scheduling and QoS class; limits drive throttling and OOMKill. Memory limits without matching JVM heap settings = surprise restarts.

kubernetesresourcesqos

Deep dive

QoS classes

Guaranteed: requests = limits on CPU and memory. Last to be evicted under node pressure.
Burstable: requests < limits. Most workloads. Eviction order depends on usage.
BestEffort: no requests/limits. First evicted.

CPU limits are usually wrong

Hard CPU limits cause throttling, which looks like latency spikes that are invisible in CPU graphs. Most teams should set CPU requests (for scheduling fairness) and skip CPU limits unless they have a noisy-neighbor problem.

Memory limits matter

Going over the memory limit = OOMKill, no grace. Set the JVM (MaxRAMPercentage) or Node (--max-old-space-size) runtime limit below the pod limit so the runtime fails gracefully instead.

Real-world example

From production

An ML inference service had latency p99 spikes of 5+ seconds. CPU usage looked fine. container_cpu_cfs_throttled_seconds_total showed massive throttling — the limit was 2 cores, but the model briefly burst to 8 during attention layers. Removing the CPU limit dropped p99 from 5 s to 200 ms with no impact on neighbors (the node had headroom).

Interview questions

2 senior-level

Q1Should you set CPU limits?▾

Usually no. CPU limits cause throttling that's invisible without container_cpu_cfs_throttled_seconds_total. Set requests for fair scheduling; only add limits when you have a measured noisy-neighbor problem.

Q2How do you size memory for a JVM pod?▾

Set the pod memory limit, then the JVM heap to ~70% of that via -XX:MaxRAMPercentage=70. Leaves room for metaspace, thread stacks, direct buffers, and the OS. Equal requests and limits give Guaranteed QoS.

Common mistakes

CPU limits set to the request value 'for safety' — guaranteed throttling.
JVM Xmx equal to pod memory limit — instant OOMKill on metaspace growth.

Trade-offs

Guaranteed QoS protects critical workloads but reduces bin-packing efficiency.
Letting pods burst CPU improves p99 but risks noisy neighbors on packed nodes.

Liveness vs Readiness Probes

Kubernetes & DevOps

JVM Tuning for Production

Java & Spring Boot