Learnitweb

How to Tune the JVM for a Low-Latency Trading-like System

In low-latency systems (like high-frequency trading platforms), response time and predictability are critical. The Java Virtual Machine (JVM), being managed and garbage-collected, requires careful tuning to minimize latency, avoid pauses, and meet real-time constraints.

This guide focuses on tuning the JVM for:

  • Low latency
  • High throughput
  • Consistent performance under pressure

1. Understand the Requirements of Low-Latency Systems

Before tuning, understand what “low-latency” implies:

FeatureExpectation
Response TimeIn microseconds or low milliseconds
JitterExtremely low (i.e., consistent response)
ThroughputSecondary to latency (but still important)
GC PausesNot acceptable in latency-sensitive threads
DeterminismPredictable latency under load

2. Choose the Right JVM and Version

Use the latest LTS JDK (e.g., JDK 17 or JDK 21) for better GC, JIT, and performance improvements.

Some alternative JVMs for ultra-low latency:

  • Azul Zing (now Azul Prime) – uses C4 (Continuously Concurrent Compacting Collector) for pause-less GC
  • OpenJDK ZGC or Shenandoah – low-pause collectors, but may not suit ultra-low latency (<10ms) needs

For extreme low-latency (<1 ms tail latency), prefer:

  • JDK + G1GC or Epsilon GC (no GC) with custom memory management

3. Select the Appropriate Garbage Collector (GC)

G1GC (Garbage First GC)

  • Good for balancing throughput and pause time
  • Supports pause time goals with -XX:MaxGCPauseMillis=

ZGC

  • Scales well for large heaps, pause time < 10ms
  • Better than G1GC for latency-sensitive apps with large memory

Shenandoah

  • Similar to ZGC but open-sourced by RedHat
  • Works better with medium heap sizes

Epsilon GC

  • No garbage collection at all
  • Useful in scenarios where memory is managed manually or app runs only for a short duration

Recommended GC Flags (G1GC) for Low Latency:

-XX:+UseG1GC
-XX:MaxGCPauseMillis=10
-XX:+ParallelRefProcEnabled
-XX:+UnlockExperimentalVMOptions
-XX:G1NewSizePercent=20
-XX:G1MaxNewSizePercent=40
-XX:G1HeapRegionSize=4m
-XX:G1ReservePercent=15
-XX:InitiatingHeapOccupancyPercent=30

4. JVM Tuning Parameters for Low-Latency

Heap Sizing

  • Keep heap size as small as possible to reduce GC time.
  • Fixed-size heaps are better for predictability:
-Xms2g -Xmx2g

Thread Stack Size

Smaller stack reduces memory footprint but can limit recursion:

-Xss256k

Direct Memory

For IO-intensive low-latency systems (like FIX engines), use off-heap memory (Netty, Agrona, Chronicle libraries):

-XX:MaxDirectMemorySize=512m

Avoid Large Object Allocations

  • Allocate objects on stack if possible
  • Avoid allocating arrays/objects > half of your region size (default 1MB)
  • Use object pools or value types (project Valhalla in the future)

5. JIT Compiler Tuning

Use C2 (optimizing compiler), enabled by default.

Key JIT tuning flags:

-XX:+TieredCompilation
-XX:+AggressiveOpts
-XX:+UseStringDeduplication

Disable Tiered Compilation (optional for predictability):

-XX:-TieredCompilation

Enable compile-only mode for ultra-critical methods (to avoid JIT warmup unpredictability):

-XX:CompileCommand=compileonly,com.your.Class::hotMethod

6. Profile and Pin Critical Threads

Thread Affinity

  • Pin critical threads (e.g., event loop, trading logic) to CPU cores
  • Use tools like taskset (Linux) or Java Thread Affinity libraries (like Jaffinity or OpenHFT Affinity)

Example:

taskset -c 2,3 java ...

Ensure critical threads don’t migrate between cores, reducing CPU cache misses.

7. Memory Access Patterns and False Sharing

Minimize false sharing:

  • Pad objects that are accessed concurrently by different threads (e.g., Disruptor’s @Contended)
@sun.misc.Contended
public class SharedData {
    volatile long value;
}

Enable @Contended:

-XX:-RestrictContended

Use cache-aligned structures

  • Use Agrona, Disruptor, or Chronicle libraries for lock-free, cache-friendly data structures.

8. Avoid Common Latency Pitfalls

Avoid Full GCs

  • Set -XX:+DisableExplicitGC to ignore calls to System.gc()

Avoid Finalizers

  • Finalizers introduce unpredictable latency
  • Use Cleaner or manual resource management

Avoid Classloading at Runtime

  • Preload all classes at startup
  • Avoid loading classes inside critical paths

9. Low-Latency Logging

Avoid synchronous logging (e.g., Log4j2 sync mode) in critical paths.

Use:

  • Async logging (AsyncAppender, DisruptorAppender)
  • Offload logging to a dedicated thread
  • Disable logging in hot paths or use ring-buffer logs

10. JVM Observability and Tools

Measure tail latency:

  • Use percentiles: P95, P99, P99.99
  • Tools: hdrhistogram, Chronicle Metrics, Java Flight Recorder

GC Analysis

  • Enable GC logs:
-Xlog:gc*:file=gc.log:time,uptime,level,tags

Use tools like:

  • jstat
  • GCViewer
  • GCEasy.io
  • JFR (Java Flight Recorder)

11. Example Full JVM Options for Low-Latency (G1GC)

-server
-Xms2g
-Xmx2g
-Xss256k
-XX:+UseG1GC
-XX:MaxGCPauseMillis=10
-XX:+ParallelRefProcEnabled
-XX:+UnlockExperimentalVMOptions
-XX:+DisableExplicitGC
-XX:+AlwaysPreTouch
-XX:+UseStringDeduplication
-XX:+PerfDisableSharedMem
-XX:-RestrictContended
-XX:InitiatingHeapOccupancyPercent=30
-XX:G1NewSizePercent=20
-XX:G1MaxNewSizePercent=40
-XX:G1ReservePercent=15
-XX:G1HeapRegionSize=4m
-XX:MaxDirectMemorySize=512m
-Xlog:gc*:file=gc.log:time,uptime,level,tags