Backpressure & Load Shedding
When demand exceeds capacity, queue or shed. Queue without limits = OOM. Shed without priority = bad UX. Senior systems decide in advance.
Deep dive
Concepts
- Backpressure: signal upstream to slow down (TCP, reactive streams, HTTP 429).
- Load shedding: actively drop low-priority work to protect SLO of high-priority.
- Adaptive concurrency (Netflix concurrency-limits, Envoy adaptive): dynamically size the in-flight window based on observed latency.
Patterns
- Bounded queues, always.
- Priority queues for tiered traffic (paid vs free, interactive vs batch).
- Cost-based admission control for AI/expensive endpoints.
Real-world example
From productionImage-processing service used an unbounded in-memory queue. A traffic spike grew the queue to 4 GB and OOM-killed the pod. Replaced with a bounded queue (1000) returning 429 with Retry-After on overflow; client used exponential backoff. The system shed gracefully; no restarts.
Interview questions
1 senior-levelQ1How do you handle traffic above capacity?▾
Shed early, with priority. Return 429 with Retry-After so clients back off. For long-running work, bounded queues with explicit overflow policy. Track shed rate as a first-class metric — it tells you when to scale before users feel it.
Common mistakes
Unbounded queues anywhere.
Shedding randomly — angry users instead of best-effort UX degradation.
Trade-offs
Strict admission control sacrifices throughput for tail latency.