How to Tune the JVM for a Low-Latency Trading-like System

In low-latency systems (like high-frequency trading platforms), response time and predictability are critical. The Java Virtual Machine (JVM), being managed and garbage-collected, requires careful tuning to minimize latency, avoid pauses, and meet real-time constraints.

This guide focuses on tuning the JVM for:

Low latency
High throughput
Consistent performance under pressure

1. Understand the Requirements of Low-Latency Systems

Before tuning, understand what “low-latency” implies:

Feature	Expectation
Response Time	In microseconds or low milliseconds
Jitter	Extremely low (i.e., consistent response)
Throughput	Secondary to latency (but still important)
GC Pauses	Not acceptable in latency-sensitive threads
Determinism	Predictable latency under load

2. Choose the Right JVM and Version

Use the latest LTS JDK (e.g., JDK 17 or JDK 21) for better GC, JIT, and performance improvements.

Some alternative JVMs for ultra-low latency:

Azul Zing (now Azul Prime) – uses C4 (Continuously Concurrent Compacting Collector) for pause-less GC
OpenJDK ZGC or Shenandoah – low-pause collectors, but may not suit ultra-low latency (<10ms) needs

For extreme low-latency (<1 ms tail latency), prefer:

JDK + G1GC or Epsilon GC (no GC) with custom memory management

3. Select the Appropriate Garbage Collector (GC)

G1GC (Garbage First GC)

Good for balancing throughput and pause time
Supports pause time goals with -XX:MaxGCPauseMillis=

ZGC

Scales well for large heaps, pause time < 10ms
Better than G1GC for latency-sensitive apps with large memory

Shenandoah

Similar to ZGC but open-sourced by RedHat
Works better with medium heap sizes

Epsilon GC

No garbage collection at all
Useful in scenarios where memory is managed manually or app runs only for a short duration

Recommended GC Flags (G1GC) for Low Latency:

-XX:+UseG1GC
-XX:MaxGCPauseMillis=10
-XX:+ParallelRefProcEnabled
-XX:+UnlockExperimentalVMOptions
-XX:G1NewSizePercent=20
-XX:G1MaxNewSizePercent=40
-XX:G1HeapRegionSize=4m
-XX:G1ReservePercent=15
-XX:InitiatingHeapOccupancyPercent=30

4. JVM Tuning Parameters for Low-Latency

Heap Sizing

Keep heap size as small as possible to reduce GC time.
Fixed-size heaps are better for predictability:

-Xms2g -Xmx2g

Thread Stack Size

Smaller stack reduces memory footprint but can limit recursion:

-Xss256k

Direct Memory

For IO-intensive low-latency systems (like FIX engines), use off-heap memory (Netty, Agrona, Chronicle libraries):

-XX:MaxDirectMemorySize=512m

Avoid Large Object Allocations

Allocate objects on stack if possible
Avoid allocating arrays/objects > half of your region size (default 1MB)
Use object pools or value types (project Valhalla in the future)

5. JIT Compiler Tuning

Use C2 (optimizing compiler), enabled by default.

Key JIT tuning flags:

-XX:+TieredCompilation
-XX:+AggressiveOpts
-XX:+UseStringDeduplication

Disable Tiered Compilation (optional for predictability):

-XX:-TieredCompilation

Enable compile-only mode for ultra-critical methods (to avoid JIT warmup unpredictability):

-XX:CompileCommand=compileonly,com.your.Class::hotMethod

6. Profile and Pin Critical Threads

Thread Affinity

Pin critical threads (e.g., event loop, trading logic) to CPU cores
Use tools like taskset (Linux) or Java Thread Affinity libraries (like Jaffinity or OpenHFT Affinity)

Example:

taskset -c 2,3 java ...

Ensure critical threads don’t migrate between cores, reducing CPU cache misses.

7. Memory Access Patterns and False Sharing

Minimize false sharing:

Pad objects that are accessed concurrently by different threads (e.g., Disruptor’s @Contended)

@sun.misc.Contended
public class SharedData {
    volatile long value;
}

Enable @Contended:

-XX:-RestrictContended

Use cache-aligned structures

Use Agrona, Disruptor, or Chronicle libraries for lock-free, cache-friendly data structures.

8. Avoid Common Latency Pitfalls

Avoid Full GCs

Set -XX:+DisableExplicitGC to ignore calls to System.gc()

Avoid Finalizers

Finalizers introduce unpredictable latency
Use Cleaner or manual resource management

Avoid Classloading at Runtime

Preload all classes at startup
Avoid loading classes inside critical paths

9. Low-Latency Logging

Avoid synchronous logging (e.g., Log4j2 sync mode) in critical paths.

Use:

Async logging (AsyncAppender, DisruptorAppender)
Offload logging to a dedicated thread
Disable logging in hot paths or use ring-buffer logs

10. JVM Observability and Tools

Measure tail latency:

Use percentiles: P95, P99, P99.99
Tools: hdrhistogram, Chronicle Metrics, Java Flight Recorder

GC Analysis

Enable GC logs:

-Xlog:gc*:file=gc.log:time,uptime,level,tags

Use tools like:

jstat
GCViewer
GCEasy.io
JFR (Java Flight Recorder)

11. Example Full JVM Options for Low-Latency (G1GC)

-server
-Xms2g
-Xmx2g
-Xss256k
-XX:+UseG1GC
-XX:MaxGCPauseMillis=10
-XX:+ParallelRefProcEnabled
-XX:+UnlockExperimentalVMOptions
-XX:+DisableExplicitGC
-XX:+AlwaysPreTouch
-XX:+UseStringDeduplication
-XX:+PerfDisableSharedMem
-XX:-RestrictContended
-XX:InitiatingHeapOccupancyPercent=30
-XX:G1NewSizePercent=20
-XX:G1MaxNewSizePercent=40
-XX:G1ReservePercent=15
-XX:G1HeapRegionSize=4m
-XX:MaxDirectMemorySize=512m
-Xlog:gc*:file=gc.log:time,uptime,level,tags