Java & Spring Boot
Java & Spring Boot/staff/freq 4/5

JVM Tuning for Production

Pick the right GC, size the heap conservatively, and measure before tuning. Modern G1 / ZGC defaults are excellent — most 'tuning' is undoing bad flags.

jvmgcperformance

Deep dive

Which GC

  • G1 (default since 11): pause-time goal, good general-purpose.
  • ZGC / Shenandoah: sub-millisecond pauses for large heaps (16 GB+).
  • Parallel: throughput-oriented batch jobs.

Heap sizing

Set -Xms = -Xmx in containers to avoid resize jitter. In Kubernetes, also set -XX:MaxRAMPercentage=75 so the JVM respects the cgroup memory limit minus headroom for metaspace, threads, and direct buffers.

Useful flags

-XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/dumps, -XX:+ExitOnOutOfMemoryError (so Kubernetes restarts the pod cleanly), -XX:+UseStringDeduplication on string-heavy services.

Observability

JFR (Java Flight Recorder) is free, low-overhead, and built-in. Enable continuous recording with -XX:StartFlightRecording=.... Pair with async-profiler for CPU/lock profiles.

Real-world example

From production

An order service was OOM-killed nightly. The team had set -Xmx8g in a 6 GB container — the JVM happily grew past the cgroup limit and got SIGKILLed. Switched to -XX:MaxRAMPercentage=70, removed the explicit Xmx, and the kills stopped. No GC tuning needed.

Interview questions

2 senior-level
Q1How do you size heap in a container?

Use -XX:MaxRAMPercentage (70–75%) so the JVM scales with the cgroup limit. Set Xms = Xmx to avoid runtime resizing. Always leave 25–30% for metaspace, thread stacks, and direct buffers — they are off-heap.

Q2G1 vs ZGC?

G1 for heaps up to ~16 GB and pause goals around 100–200 ms. ZGC when you need sub-10 ms pauses or heaps in the hundreds of GB. ZGC has slightly higher throughput cost — measure your workload before assuming.

Common mistakes

  • Copy-pasting GC flags from a blog post without measuring.

  • Ignoring metaspace — frameworks like Spring + Hibernate easily use 300+ MB.

  • Setting Xmx larger than the container limit.

Trade-offs

  • Lower pause time usually costs throughput — pick based on SLO.

  • Larger heap = longer GC pauses with G1; ZGC mitigates this but uses more CPU.

Related