System Design/staff/freq 3/5

Backpressure & Load Shedding

When demand exceeds capacity, queue or shed. Queue without limits = OOM. Shed without priority = bad UX. Senior systems decide in advance.

backpressureresiliencequeueing

Deep dive

Concepts

Backpressure: signal upstream to slow down (TCP, reactive streams, HTTP 429).
Load shedding: actively drop low-priority work to protect SLO of high-priority.
Adaptive concurrency (Netflix concurrency-limits, Envoy adaptive): dynamically size the in-flight window based on observed latency.

Patterns

Bounded queues, always.
Priority queues for tiered traffic (paid vs free, interactive vs batch).
Cost-based admission control for AI/expensive endpoints.

Real-world example

From production

Image-processing service used an unbounded in-memory queue. A traffic spike grew the queue to 4 GB and OOM-killed the pod. Replaced with a bounded queue (1000) returning 429 with Retry-After on overflow; client used exponential backoff. The system shed gracefully; no restarts.

Interview questions

1 senior-level

Q1How do you handle traffic above capacity?▾

Shed early, with priority. Return 429 with Retry-After so clients back off. For long-running work, bounded queues with explicit overflow policy. Track shed rate as a first-class metric — it tells you when to scale before users feel it.

Common mistakes

Unbounded queues anywhere.
Shedding randomly — angry users instead of best-effort UX degradation.

Trade-offs

Strict admission control sacrifices throughput for tail latency.

Resilience: Circuit Breakers, Retries & Timeouts

System Design

Virtual Threads (Project Loom)

Java & Spring Boot