In low-latency systems (like high-frequency trading platforms), response time and predictability are critical. The Java Virtual Machine (JVM), being managed and garbage-collected, requires careful tuning to minimize latency, avoid pauses, and meet real-time constraints.
This guide focuses on tuning the JVM for:
- Low latency
- High throughput
- Consistent performance under pressure
1. Understand the Requirements of Low-Latency Systems
Before tuning, understand what “low-latency” implies:
| Feature | Expectation |
|---|---|
| Response Time | In microseconds or low milliseconds |
| Jitter | Extremely low (i.e., consistent response) |
| Throughput | Secondary to latency (but still important) |
| GC Pauses | Not acceptable in latency-sensitive threads |
| Determinism | Predictable latency under load |
2. Choose the Right JVM and Version
Use the latest LTS JDK (e.g., JDK 17 or JDK 21) for better GC, JIT, and performance improvements.
Some alternative JVMs for ultra-low latency:
- Azul Zing (now Azul Prime) – uses C4 (Continuously Concurrent Compacting Collector) for pause-less GC
- OpenJDK ZGC or Shenandoah – low-pause collectors, but may not suit ultra-low latency (<10ms) needs
For extreme low-latency (<1 ms tail latency), prefer:
- JDK + G1GC or Epsilon GC (no GC) with custom memory management
3. Select the Appropriate Garbage Collector (GC)
G1GC (Garbage First GC)
- Good for balancing throughput and pause time
- Supports pause time goals with
-XX:MaxGCPauseMillis=
ZGC
- Scales well for large heaps, pause time < 10ms
- Better than G1GC for latency-sensitive apps with large memory
Shenandoah
- Similar to ZGC but open-sourced by RedHat
- Works better with medium heap sizes
Epsilon GC
- No garbage collection at all
- Useful in scenarios where memory is managed manually or app runs only for a short duration
Recommended GC Flags (G1GC) for Low Latency:
-XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=20 -XX:G1MaxNewSizePercent=40 -XX:G1HeapRegionSize=4m -XX:G1ReservePercent=15 -XX:InitiatingHeapOccupancyPercent=30
4. JVM Tuning Parameters for Low-Latency
Heap Sizing
- Keep heap size as small as possible to reduce GC time.
- Fixed-size heaps are better for predictability:
-Xms2g -Xmx2g
Thread Stack Size
Smaller stack reduces memory footprint but can limit recursion:
-Xss256k
Direct Memory
For IO-intensive low-latency systems (like FIX engines), use off-heap memory (Netty, Agrona, Chronicle libraries):
-XX:MaxDirectMemorySize=512m
Avoid Large Object Allocations
- Allocate objects on stack if possible
- Avoid allocating arrays/objects > half of your region size (default 1MB)
- Use object pools or value types (project Valhalla in the future)
5. JIT Compiler Tuning
Use C2 (optimizing compiler), enabled by default.
Key JIT tuning flags:
-XX:+TieredCompilation -XX:+AggressiveOpts -XX:+UseStringDeduplication
Disable Tiered Compilation (optional for predictability):
-XX:-TieredCompilation
Enable compile-only mode for ultra-critical methods (to avoid JIT warmup unpredictability):
-XX:CompileCommand=compileonly,com.your.Class::hotMethod
6. Profile and Pin Critical Threads
Thread Affinity
- Pin critical threads (e.g., event loop, trading logic) to CPU cores
- Use tools like
taskset(Linux) or Java Thread Affinity libraries (like Jaffinity or OpenHFT Affinity)
Example:
taskset -c 2,3 java ...
Ensure critical threads don’t migrate between cores, reducing CPU cache misses.
7. Memory Access Patterns and False Sharing
Minimize false sharing:
- Pad objects that are accessed concurrently by different threads (e.g., Disruptor’s
@Contended)
@sun.misc.Contended
public class SharedData {
volatile long value;
}
Enable @Contended:
-XX:-RestrictContended
Use cache-aligned structures
- Use Agrona, Disruptor, or Chronicle libraries for lock-free, cache-friendly data structures.
8. Avoid Common Latency Pitfalls
Avoid Full GCs
- Set
-XX:+DisableExplicitGCto ignore calls toSystem.gc()
Avoid Finalizers
- Finalizers introduce unpredictable latency
- Use
Cleaneror manual resource management
Avoid Classloading at Runtime
- Preload all classes at startup
- Avoid loading classes inside critical paths
9. Low-Latency Logging
Avoid synchronous logging (e.g., Log4j2 sync mode) in critical paths.
Use:
- Async logging (
AsyncAppender,DisruptorAppender) - Offload logging to a dedicated thread
- Disable logging in hot paths or use ring-buffer logs
10. JVM Observability and Tools
Measure tail latency:
- Use percentiles: P95, P99, P99.99
- Tools: hdrhistogram, Chronicle Metrics, Java Flight Recorder
GC Analysis
- Enable GC logs:
-Xlog:gc*:file=gc.log:time,uptime,level,tags
Use tools like:
jstatGCViewerGCEasy.ioJFR (Java Flight Recorder)
11. Example Full JVM Options for Low-Latency (G1GC)
-server -Xms2g -Xmx2g -Xss256k -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -XX:+UseStringDeduplication -XX:+PerfDisableSharedMem -XX:-RestrictContended -XX:InitiatingHeapOccupancyPercent=30 -XX:G1NewSizePercent=20 -XX:G1MaxNewSizePercent=40 -XX:G1ReservePercent=15 -XX:G1HeapRegionSize=4m -XX:MaxDirectMemorySize=512m -Xlog:gc*:file=gc.log:time,uptime,level,tags
