Learnitweb

How G1 Garbage Collector (G1GC) Works

The G1 Garbage Collector (G1GC) is a modern, server-style garbage collector in Java, introduced in Java 7 (experimental) and fully supported from Java 8 onwards. G1GC is designed for large heaps (multiple gigabytes) and applications where low pause times are important, such as high-throughput server applications or real-time systems.

It differs from traditional collectors (like Parallel GC or CMS) by dividing the heap into regions rather than large contiguous Young and Old generations. This approach enables G1GC to perform incremental, concurrent garbage collection with more predictable pause times.

1. Key Concepts of G1GC

1.1 Heap Structure

In G1GC, the Java heap is divided into equal-sized regions, typically ranging from 1 MB to 32 MB, depending on the total heap size. These regions are dynamically assigned roles based on object allocation and promotion:

  • Young regions: For newly created objects. These regions are subdivided logically into Eden (where new objects are allocated) and Survivor regions (where surviving objects from Eden are moved).
  • Old regions: For long-lived objects that survive multiple garbage collection cycles in the Young regions.
  • Humongous regions: For extremely large objects (larger than half a region size), which are allocated across multiple contiguous regions.

The main advantage of dividing the heap into regions is that G1GC can focus on garbage collection of regions that contain the most garbage, instead of scanning the entire Old generation. This reduces pause times and improves efficiency.

1.2 G1GC Goals

The G1GC was designed with specific goals:

  • Minimize pause times: By performing concurrent and incremental garbage collection rather than full stop-the-world GC.
  • Predictable pause time management: G1GC allows developers to set target maximum pause times (e.g., 100 ms), and the collector dynamically adjusts its behavior to meet those goals.
  • Efficient large heap management: Unlike CMS, G1GC efficiently manages heaps of tens of gigabytes.
  • Automatic fragmentation control: G1GC performs incremental compaction, reducing the risk of heap fragmentation.

These goals make G1GC suitable for low-latency applications where predictable performance is critical.

1.3 Heap Division and Memory Layout

The heap in G1GC is divided as follows:

+-------------------------------------------------+
|                     G1 Heap                     |
|-------------------------------------------------|
| Young Regions | Old Regions | Humongous Objects |
+-------------------------------------------------+
  • Young regions: Typically occupy 20–30% of the heap initially, but the proportion is dynamically adjusted by G1GC.
  • Old regions: Occupy the rest of the heap and store objects promoted from the Young regions.
  • Humongous objects: Stored in contiguous regions to handle objects larger than half the size of a region.

By using regions instead of contiguous memory spaces, G1GC can reclaim memory more efficiently and avoid costly full-heap compactions.

2. How G1GC Works

G1GC works in phases, using a combination of parallel, concurrent, and incremental garbage collection techniques.

2.1 Young Generation Collection (Minor GC)

  • Object Allocation: New objects are allocated in the Eden region.
  • Triggering Minor GC: When Eden fills up, a minor GC occurs. This is a stop-the-world (STW) event, meaning all application threads are paused.
  • Object Promotion: Surviving objects in Eden are copied to Survivor regions. If an object survives enough cycles (determined by the tenuring threshold), it is promoted to Old regions.
  • Copying Collection: G1GC performs a copying collection in Young regions. Instead of compacting the whole Young generation, it only copies live objects from full regions to free regions.
  • Parallelism: Minor GC is performed in parallel, using multiple threads to reduce pause time.

Key Details:

  • G1GC dynamically resizes Young and Old regions based on the application’s allocation pattern.
  • Minor GCs are fast but crucial because they prevent Eden from overflowing.

2.2 Concurrent Marking Phase

The concurrent marking phase identifies which Old regions contain the most garbage, allowing G1GC to prioritize them during collection.

Steps in the concurrent marking cycle:

  1. Initial Mark (STW): Marks objects reachable from GC roots (such as static fields, stack references). This pause is usually short.
  2. Root Region Scanning (Concurrent): Scans roots in the background to find live objects.
  3. Concurrent Marking (Concurrent): Traverses live objects in Old regions, recording references in remembered sets (RSets), which track cross-region references.
  4. Remark (STW): Completes the marking process, ensuring all reachable objects are correctly identified.
  5. Cleanup (Concurrent): Updates region statistics and identifies regions with the most reclaimable garbage for mixed GC.

The concurrent marking phase is designed to minimize stop-the-world pauses while still identifying Old regions with high garbage content.

2.3 Old Generation Collection (Mixed GC)

After concurrent marking, G1GC performs mixed GC, which collects both:

  • Young regions (like a minor GC).
  • Old regions with high garbage content (selected based on concurrent marking).

Steps in mixed GC:

  1. Region Selection: G1GC selects Old regions that contain the most reclaimable space.
  2. Evacuation: Live objects are copied to new regions, freeing the old ones.
  3. Compaction: Unlike CMS, G1GC performs incremental compaction, which reduces fragmentation and avoids long pauses.

This approach ensures that Old generation collection is efficient and pause-time predictable.

2.4 Humongous Objects

  • Objects larger than 50% of a region are considered humongous.
  • They are allocated in contiguous humongous regions, which can span multiple regions.
  • Collection: Humongous objects are collected during mixed GC. Since they occupy contiguous memory, reclaiming them can be slower, but G1GC handles this efficiently by prioritizing regions with maximum garbage.

3. Advantages of G1GC

  1. Predictable Pause Times: By allowing developers to set pause time goals, G1GC dynamically balances collection work to stay within those targets.
  2. Efficient Large Heap Management: G1GC can handle heaps of tens of gigabytes, which traditional collectors struggle with.
  3. Incremental Compaction: Avoids long stop-the-world events by compacting Old regions incrementally.
  4. Reduced Fragmentation: Regions and incremental compaction reduce memory fragmentation over time.
  5. Concurrent Processing: Most of the Old generation marking and cleanup happens concurrently with application threads, improving overall throughput.

4. G1GC Phases Summary Diagram

  +-----------------------+
  |  Young Gen (Eden + S)|
  +-----------------------+
           |
           v
      Minor GC (STW)
           |
           v
  +-----------------------+
  |  Old Gen + Humongous  |
  +-----------------------+
           |
           v
  Concurrent Marking Cycle
           |
           v
       Mixed GC (STW + Concurrent)
           |
           v
     Evacuate Regions (Copy live objects)
           |
           v
        Free Memory

This diagram shows the flow from object allocation to collection and evacuation, highlighting where stop-the-world pauses occur and where concurrent work happens.

5. Important JVM Flags for G1GC

  • -XX:+UseG1GC: Enables G1GC.
  • -Xmx<size> / -Xms<size>: Set maximum and initial heap sizes.
  • -XX:MaxGCPauseMillis=<n>: Target maximum pause time in milliseconds. G1GC adjusts the collection to meet this target.
  • -XX:InitiatingHeapOccupancyPercent=<n>: Specifies the heap occupancy percentage at which concurrent marking starts.
  • -XX:G1HeapRegionSize=<size>: Region size (1–32 MB). Larger heaps require larger region sizes.
  • -XX:ParallelGCThreads=<n>: Number of threads used for parallel garbage collection.
  • -XX:ConcGCThreads=<n>: Number of threads used for concurrent phases.

By tuning these flags, you can balance throughput and pause-time requirements depending on your application needs.

6. Differences Between G1GC and CMS

FeatureG1GCCMS
Heap divisionDivided into multiple regionsTraditional Young + Old
Pause time controlPredictable, configurable with targetsBest-effort, unpredictable
CompactionIncremental, automaticRare, manual compaction required
Large heap handlingEfficient, scales wellLess efficient for very large heaps
Fragmentation controlAutomatic incremental compactionCan cause fragmentation over time
Concurrent phasesMarking + cleanup concurrentMostly marking concurrent only

7. Best Practices for G1GC

  1. Use G1GC for large heaps or low-latency applications: It’s the default GC in modern Java and optimized for predictable pause times.
  2. Set pause time goals using MaxGCPauseMillis: Example: -XX:MaxGCPauseMillis=200 ensures G1 tries to keep pauses below 200 ms.
  3. Monitor humongous object allocation: Try to avoid excessively large objects, as they require multiple regions and may slow GC.
  4. Enable GC logging: Use -Xlog:gc*,gc+heap=info to understand GC behavior and tune parameters.
  5. Avoid very small heap sizes: G1GC is designed for medium to large heaps; small heaps may see more overhead than benefit.

G1GC is now the default garbage collector in Java 17+, replacing CMS for most applications, thanks to its predictable pause times, better heap management, and incremental compaction. It’s particularly well-suited for real-time systems, large-scale server applications, and multi-GB heap environments.