System Design
System Design/staff/freq 3/5

Backpressure & Load Shedding

When demand exceeds capacity, queue or shed. Queue without limits = OOM. Shed without priority = bad UX. Senior systems decide in advance.

backpressureresiliencequeueing

Deep dive

Concepts

  • Backpressure: signal upstream to slow down (TCP, reactive streams, HTTP 429).
  • Load shedding: actively drop low-priority work to protect SLO of high-priority.
  • Adaptive concurrency (Netflix concurrency-limits, Envoy adaptive): dynamically size the in-flight window based on observed latency.

Patterns

  • Bounded queues, always.
  • Priority queues for tiered traffic (paid vs free, interactive vs batch).
  • Cost-based admission control for AI/expensive endpoints.

Real-world example

From production

Image-processing service used an unbounded in-memory queue. A traffic spike grew the queue to 4 GB and OOM-killed the pod. Replaced with a bounded queue (1000) returning 429 with Retry-After on overflow; client used exponential backoff. The system shed gracefully; no restarts.

Interview questions

1 senior-level
Q1How do you handle traffic above capacity?

Shed early, with priority. Return 429 with Retry-After so clients back off. For long-running work, bounded queues with explicit overflow policy. Track shed rate as a first-class metric — it tells you when to scale before users feel it.

Common mistakes

  • Unbounded queues anywhere.

  • Shedding randomly — angry users instead of best-effort UX degradation.

Trade-offs

  • Strict admission control sacrifices throughput for tail latency.

Related