SlideShare a Scribd company logo
1 of 70
Download to read offline
Jean-Philippe BEMPEL
WebScale
@jpbempel
Understanding
JVM GC
2 •
• GC basics
• G1
• Shenandoah
• Azul’s C4
• ZGC
• How to choose a GC algorithm?
Understanding JVM GC: Advanced!
GC Basics
4 •
Generations
5 •
• Traversing references to mark live objects
• Stopping when reaching old generation
• From GC roots (static fields, thread stack, JNI)
Marking for Minor GC
Young Old
6 •
Card Table for references old -> young references
Write barrier to update card table on assignation
X.f = Y
Card Table
Young 0 0 1
CARD_TABLE[&X >> 9] = 1
mov DWORD PTR [r10+0x6c],r8d
mov r11,r10
shr r11,0x9
mov r8d,0x2383000
mov BYTE PTR [r8+r11*1],r12b
G1
8 •
• Generational
• Region based
• Pause time target (soft real-time)
• -XX:MaxGCPauseMillis=n (default 200)
• Default GC since JDK9
Garbage First
9 •
Heap divided into fixed-size regions
Regions
10 •
Regions
Credit: Kirk Pepperdine
11 •
• Young collection (STW)
• Initial Mark (STW)
• Concurrent Marking
• Final Remark (STW)
• Cleanup (STW)
• Mixed collection (STW)
G1 phases
12 •
• Stop-The-World event
• Evacuates live objects to Survivor or Old regions
• Only objects in young generation are considered
Young GC
13 •
• Card table per region
• Avoid scanning the entire heap
Remembered Sets
14 •
• For each reference assignation (X.f = Y) we need to check:
• References (X & Y) are NOT in the same region
• Y is not null
• => enqueue for Remebered Set processing
• Refinement threads to process the queue
• Additional instructions added after assignation
Remembered Sets: Post Write Barrier
if (!isInSameRegion(X, Y)
&& Y != null)
RSEnqueue(X)
mov DWORD PTR [rbp+0x74],r10d
mov r11,rbp
mov r8,r10
shl r8,0x3
xor r8,r11
shr r8,0x14
test r8,r8
je cont
test r10d,r10d
je cont
shr r11,0x9
movabs rcx,0x2965ecc3000
add rcx,r11
cmp BYTE PTR [rcx],0x20
je cont
mov r10,QWORD PTR [r15+0x70]
mov r11,QWORD PTR [r15+0x80]
lock add DWORD PTR [rsp-0x40],0x0
cmp BYTE PTR [rcx],0x0
je cont
mov BYTE PTR [rcx],0x0
test r10,r10
jne 0x000002965edc62bc
mov rdx,r15
movabs r10,0x7ffac2febc30
call r10
jmp cont
mov QWORD PTR [r11+r10*1-0x8],rcx
add r10,0xfffffffffffffff8
mov QWORD PTR [r15+0x70],r10
15 •
• Triggered based on Initiating Heap Occupancy Percent flag (IHOP default to 45%)
• Try to mark the whole object graph concurrently with the application running
• Based on Tri-color abstraction & Snapshot-At-The-Beginning algorithm
Concurrent Marking
16 •
Concurrent Marking: Tri-Color Abstraction
17 •
Concurrent Marking: Issues
• New allocations during marking phase can be handled by:
• Marking automatically object at allocation
• Not considering new allocations for the current cycle
• Tri-Color abstraction provides 2 properties of missed object:
1. The mutator stores a reference to a white object into a black object.
2. All paths from any gray objects to that white object are destroyed.
http://www.memorymanagement.org/glossary/s.html#term-snapshot-at-the-beginning
18 •
Concurrent Marking: Issues
A
B
C
A.field1 = C;
B.field2 = null;
OOPS!
19 •
• 2 ways to ensure not missing any marking
• For SATB, Pre-Write Barriers, recording object for marking
• SATB barrier is only active when Marking is on (global state)
Concurrent Marking: Resolving misses
if (SATB_WriteBarrier) {
if (X.f != null)
SATB_enqueue(X.f);
}
cmp BYTE PTR [r15+0x30],0x0
jne 0x000002965edc62e5
[...]
mov r11d,DWORD PTR [rbp+0x74]
test r11d,r11d
je 0x000002965edc6253
mov r10,QWORD PTR [r15+0x38]
mov rcx,r11
shl rcx,0x3
test r10,r10
je 0x000002965edc6318
mov r11,QWORD PTR [r15+0x48]
mov QWORD PTR [r11+r10*1-0x8],rcx
add r10,0xfffffffffffffff8
mov QWORD PTR [r15+0x38],r10
jmp 0x000002965edc6253
mov rdx,r15
movabs r10,0x7ffac2febc50
call r10
jmp 0x000002965edc6253
20 •
• At the end of Marking, we have per region liveness information
• Regions are sorted by liveness (ascending)
• Regions full of garbage are collected during cleanup STW phase
• CollectionSet is built based on
• Liveness, up until thresholds (G1HeapWastePercent,
G1MixedGCLiveThresholdPercent)
• Maximum number of regions (G1OldCSetRegionThresholdPercent)
CollectionSet
21 •
• Based on CollectionSet, G1 schedule to collect part of old regions
• When a Young is triggered, old regions to collect are piggy backed
• Not all old regions are considered to not waste time and reach the pause goal
• Several Young GCs can be used to collect old regions (mixed event)
Mixed GC
22 •
Mixed GC
23 •
• Still fallback to FullGC (serial < JDK10)
• Fragmentation can still happen (regions with lot of lived objects)
• Still unpredictable
FullGC
Shenandoah
25 •
• Non-generational (still option for partial collection)
• Region based
• Use Read Barrier: Brooks pointer
• Self-Healing
• Cooperation between mutator threads & GC threads
• Only for concurrent compaction
• Mostly based on G1 but with concurrent compaction
Shenandoah GC
26 •
• Initial Marking (STW)
• Concurrent Marking
• Final Remark (STW)
• Concurrent Cleanup
• Concurrent Evacuation
• Init Update References (STW)
• Concurrent Update References
• Final Update References (STW)
• Concurrent Cleanup
Shenandoah Phases
27 •
• SATB-style (like G1)
• 2 STW pauses for Initial Mark & Final Remark
• Conditional Write Barrier
• To deal with concurrent modification of object graph
Concurrent Marking
28 •
• Same principle than G1:
• Build CollectionSet with Garbage First!
• Evacuate to new regions to release the region for reuse
• Concurrent Evacuation done with the help of:
• 1 Read Barrier : Brooks pointer
• 4 Write Barriers
• Barriers help to keep the to-space invariant:
• All Writes are made into an object in to-space
Concurrent Evacuation
29 •
• All objects have an additional forwarding pointer
• Placed before the regular object
• Dereference the forwarding pointer for each access
• Memory footprint overhead
• Throughput overhead
Brooks pointers
Header
Brooks pointer
mov r13,QWORD PTR [r12+r14*8-0x8]
30 •
Concurrent Copy: GC thread
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
GC thread
31 •
Concurrent Copy: Reader threads
Header
Brooks pointer
From-Space To-Space
Reader
thread
Reader
thread
32 •
Concurrent Copy: Writer threads
Header
Brooks pointer
Header
Brooks pointer
From-Space To-Space
Writer
thread
Writer
thread
Header
Brooks pointer
33 •
• Any writes (even primitives) to from-space object needs to be protected
• Exotic barriers:
• acmp (pointer comparison)
• CAS
• clone
Write Barriers
if (evacInProgress
&& inCollectionSet(obj)
&& notCopyYet(obj)) {
evacuateObject(obj)
}
test BYTE PTR [r15+0x3c0],0x2
jne 0x000000000281bcbc
[...]
mov r10d,DWORD PTR [r13+0xc]
test r10d,r10d
je 0x000000000281bc2b
mov r11,QWORD PTR [r15+0x360]
mov rcx,r10
shl rcx,0x3
test r11,r11
je 0x000000000281bd0d
[...]
mov rdx,r15
movabs r10,0x62d1f660
call r10
jmp 0x000000000281bc2b
34 •
• Late memory release
• Only happens when all refs updated (Concurrent Cleanup phase)
• Allocations can overrun the GC
• Failure modes:
• Pacing
• Degenerated GC
• FullGC
Extreme cases
Azul’s C4
36 •
• Generational (young & old)
• Region based (pages)
• Use Read Barrier: Loaded Value Barrier
• Self-Healing
• Cooperation between mutator threads & GC threads
• Pauseless algorithm but implementation requires safepoints
• Pauses are most of the time < 1ms
Continuously Concurrent Compacting Collector
37 •
• Baker-style Barrier
• move objects through forwarding addresses stored aside
• Applied at load time, not when dereferencing
• Ensure C4 invariants:
• Marked Through the current cycle
• Not relocated
• If not => Self-healing process to correct it
• Mark object
• Relocate & correct reference
• Checked for each reference loads
• Benefits from JIT optimization for caching loaded value (registers)
LVB
38 •
• States of objects stored inside reference address => Colored pointers
• NMT bit
• Generation
• Checked against a global expected value during the GC cycle
• Thread local, almost always L1 cache hits
• Register
• Relocated: x86 Implementation use trap from VM memory translation Guest/Host
• Intel EPT
• AMD NPT
LVB
test r9, rax
jne 0x3001443b
mov r10d, dword ptr [rax + 8]
39 •
Virtual Memory vs Physical Memory
Virtual Memory
Physical Memory
0 2^64
0 2^37
40 •
• All phases are fully parallel & concurrent
• No "rush" to finish phases
• No constraint about STW pause to be short
• Physical memory released quickly in relocation phase
• Can be reused for new allocations
• Plenty of virtual space vs physical memory
C4 Phases
41 •
• Mark
• Marking all objects in graph
• Relocation
• Moving objects to release pages
• Remap
• Fixup references in object graph
• Folded with next mark cycle
C4 Phases
42 •
• Incremental Update Marking (vs SATB)
• Single pass
• No final mark/remark
• Self-Healing: Mark object that are not marked for the current cycle
Mark Phase
43 •
Mark Phase: Concurrent Modification
A
B
C
A.field1 = C;
B.field2 = null;
LVB
44 •
• Scanning roots (Static var, Thread stacks, register, JNI handles)
• GC threads scans stalled threads
• Running threads scans their own stack stopping individually at Safepoint
• Scanning object graph like a parallel collector
• Newly allocated objects into new pages, not considered for reclaim (relocation)
• For each page, summing live data bytes, used to select page to reclaim
Mark Phase
45 •
• Select pages with the greatest number of dead objects (garbage first!)
• Protect page selected from being accessed by mutators thread
• Move objects to new allocated pages
• Build side arrays (off heap tables) for forwarding information
• Self-Healing: As protected, LVB will trigger a trap to:
• Copy object to the new location if not done
• Use forward pointer to fix the reference
Relocation Phase
46 •
Virtual
Physical
Relocation Phase
Forwarding table
47 •
• Few chances mutators stall on accessing a ref as processing mostly dead pages
• Once object copy done, physical memory is released (Quick Release)
• Can be immediately reused (remapped) to satisfy new allocations
• Pages evacuated are still mapped & protected to help remap phase
• Cannot be released until all objects are remapped
• Not a problem as we have a huge virtual address space
Relocation Phase
48 •
• Traverse Object Graph and fixup references
• Execute LVB barrier for each object
• Self-Healing: fixup references using forward information
• As we traverse again, mark for the next phase
• Mark & Remap phases are folded!
Remap Phase
49 •
• Algorithm requires a sustainable rate or remapping operations
• Linux limitations:
• TLB invalidation
• Only 4KB pages can be remapped
• Single threaded remapping (write lock)
• Kernel module implements API for the Zing JVM to increase significantly the remapping rate
• Implements also virtual address aliasing for addressing objects with metadata
Remap – Kernel module
50 •
• Young & Old collections done by same algorithm and can be concurrent
• Size of the generation are dynamically adjusted
• Card Marking with write barrier (Stored Value Barrier)
• Old collection is based on young-to-old roots generated by previous young cycle
• Young collection will perform card scanning per page
• hold an eventual concurrent Old collection per page scanned
Generational
51 •
• Used by Hadoop Name Node
• 580GB Heap
• Very hard to tune with G1
• No issue so far regarding GC since production roll out (Oct 2017)
C4 @ Criteo
Z GC
53 •
• Non generational
• Region based (zPages, dynamically sized)
• Concurrent Marking, Compaction, Ref processing
• Use Colored Pointers & Read/Load Barrier
• Self-Healing
• Cooperation between mutator threads & GC threads
• Experimental in JDK 11 (-XX:+UnlockExperimentalVMOptions –XX:+UseZGC)
Z GC
mov r10,QWORD PTR [r11+0xb0]
test QWORD PTR [r15+0x20],r10
jne 0x00007f9594cc54b5
54 •
Z GC
55 •
• Initial Mark (STW)
• Concurrent Mark/Remap
• Final Mark (STW)
• Concurrent Prepare for Relocation
• Start Relocate (STW)
• Concurrent Relocate
Z GC phases:
56 •
• Store metadata in unused bits of reference address
• 42 bits for addressing (4TB)
• 4 bits for metadata
• Marked0
• Marked1
• Remapped
• Finalizable
Colored Pointers
57 •
• Colored pointers needs to be unmasked for dereferencing
• Some HW support masking (SPARC, Aarch64))
• On linux/windows, overhead if done with classical instructions
• Only one view is active at any point
• Plenty of Virtual Space
Multi-Mapping
58 •
Multi-Mapping
Virtual Memory
Physical Memory
0 2^64
0 2^37
(marked0)
001<address>
(marked1)
010<address>
(remapped)
100<address>
59 •
• Pages are multiple of 2MB
• 3 different groups
• Small: 2MB pages with object size <= 256KB
• Medium: 32MB pages with object size <= 4MB
• Large: 2MB pages, objects span over multiple of them
• Objects in Large group are meant to not to be relocated (too expensive)
Page Allocations
60 •
• Handling remapping
• C4: Memory protection + trap
• Z: mask in colored pointer
• Unmasking ref addresses
• C4: Kernel module aliasing
• Z: Multi-mapping or HW support
• Pages & Relocation
• C4:
• Page are fixed to match OS size (mem protection)
• relocation for large objects by remapping
• Z:
• zPages are dynamic, a zPage can be 100MB large
• No relocation for large objects
Difference between C4 & Z GC
How to choose a GC algorithm
62 •
• Case 1:
• Need maximum of work done in a time frame (offline job)
• Can afford FullGC of several seconds
 Use a throughput collector like ParalleGC or G1
• Case 2:
• Have time constraint per unit of work (online job)
• Cannot afford FullGC of several seconds
 Use a low latency collector like C4, Shenandoah or Z
Throughput vs Latency
63 •
• You have to run on Windows
• Shenandoah
• Battlefield tested GC (maturity)
• C4
• Shenandoah
• Minimizing any kind of JVM pauses
• C4
• Z
• You don’t want pay for it:
• Shenandoah
• Z
Low latency GCs
References
65 •
• Java Garbage Collection distilled by Martin Thompson
• The Java GC mini book
• Oracle’s white paper on JVM memory management & GC
• What differences JVM makes by Nitsan Wakart
• Memory Management Reference
• IBM Pause-Less GC
References GC Basics
66 •
• Garbage-First Garbage Collection (2004)
• G1 One Garbage Collector to rule them all by Monica Beckwith
• Tips for Tuning The G1 GC by Monica Beckwith
• G1 Garbage Collector Details and Tuning by Simone Bordet
• Write Barriers in Garbage-First Garbage Collector by Monica Beckwith
References G1
67 •
• Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK
• Shenandoah: The Garbage Collector That Could by Aleksey Shipilev
• Shenandoah GC Wiki
References Shenandoah
68 •
• The Pauseless GC algorithm (2005)
• C4: Continuously Concurrent Compacting Collector (2011)
• Azul GC in Detail by Charles Humble
• 2010 version source code
References C4
69 •
• ZGC - Low Latency GC for OpenJDK by Per Liden
• Java's new Z Garbage Collector (ZGC) is very exciting by Richard Warburton
• A first look into ZGC by Dominik Inführ
• Architectural Comparison with C4/Pauseless
References ZGC
Thank You!
@jpbempel

More Related Content

What's hot

Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com confluent
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Garbage First Garbage Collector: Where the Rubber Meets the Road!
Garbage First Garbage Collector: Where the Rubber Meets the Road!Garbage First Garbage Collector: Where the Rubber Meets the Road!
Garbage First Garbage Collector: Where the Rubber Meets the Road!Monica Beckwith
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016DataStax
 
How NOT to Write a Microbenchmark
How NOT to Write a MicrobenchmarkHow NOT to Write a Microbenchmark
How NOT to Write a MicrobenchmarkAzul Systems Inc.
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsCloudera, Inc.
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
 
Same Origin Method Execution (BlackHat EU2014)
Same Origin Method Execution (BlackHat EU2014)Same Origin Method Execution (BlackHat EU2014)
Same Origin Method Execution (BlackHat EU2014)Ben Hayak
 
Apache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutYaroslav Tkachenko
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafkaJiangjie Qin
 
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Jean-Philippe BEMPEL
 
Choosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory UsageChoosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory UsageJelastic Multi-Cloud PaaS
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Java Crash分析(2012-05-10)
Java Crash分析(2012-05-10)Java Crash分析(2012-05-10)
Java Crash分析(2012-05-10)Kris Mok
 
Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberYing Zheng
 
The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]Mahmoud Hatem
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 

What's hot (20)

Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Garbage First Garbage Collector: Where the Rubber Meets the Road!
Garbage First Garbage Collector: Where the Rubber Meets the Road!Garbage First Garbage Collector: Where the Rubber Meets the Road!
Garbage First Garbage Collector: Where the Rubber Meets the Road!
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
 
How NOT to Write a Microbenchmark
How NOT to Write a MicrobenchmarkHow NOT to Write a Microbenchmark
How NOT to Write a Microbenchmark
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame Graphs
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Same Origin Method Execution (BlackHat EU2014)
Same Origin Method Execution (BlackHat EU2014)Same Origin Method Execution (BlackHat EU2014)
Same Origin Method Execution (BlackHat EU2014)
 
Apache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know About
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
 
Choosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory UsageChoosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Java Crash分析(2012-05-10)
Java Crash分析(2012-05-10)Java Crash分析(2012-05-10)
Java Crash分析(2012-05-10)
 
Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at Uber
 
The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]
 
GDB Rocks!
GDB Rocks!GDB Rocks!
GDB Rocks!
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 

Similar to Understanding jvm gc advanced

Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2Jean-Philippe BEMPEL
 
Low latency Java apps
Low latency Java appsLow latency Java apps
Low latency Java appsSimon Ritter
 
OpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsOpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsMonica Beckwith
 
Cache aware hybrid sorter
Cache aware hybrid sorterCache aware hybrid sorter
Cache aware hybrid sorterManchor Ko
 
Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) CollectorJava Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) CollectorGurpreet Sachdeva
 
Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a clusterGal Marder
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differencesJean-Philippe BEMPEL
 
G1 collector and tuning and Cassandra
G1 collector and tuning and CassandraG1 collector and tuning and Cassandra
G1 collector and tuning and CassandraChris Lohfink
 
Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnGuerrilla
 
Single instruction multiple data
Single instruction multiple dataSingle instruction multiple data
Single instruction multiple dataSyed Zaid Irshad
 
Gunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdfGunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdfssuser30e7d2
 
JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011Kris Mok
 
Scale Relational Database with NewSQL
Scale Relational Database with NewSQLScale Relational Database with NewSQL
Scale Relational Database with NewSQLPingCAP
 
Tips and tricks for building high performance android apps using native code
Tips and tricks for building high performance android apps using native codeTips and tricks for building high performance android apps using native code
Tips and tricks for building high performance android apps using native codeKenneth Geisshirt
 
Tuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesTuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesSergey Podolsky
 

Similar to Understanding jvm gc advanced (20)

Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2
 
Understanding low latency jvm gcs
Understanding low latency jvm gcsUnderstanding low latency jvm gcs
Understanding low latency jvm gcs
 
Understanding JVM GC: advanced!
Understanding JVM GC: advanced!Understanding JVM GC: advanced!
Understanding JVM GC: advanced!
 
Low latency Java apps
Low latency Java appsLow latency Java apps
Low latency Java apps
 
OpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsOpenJDK Concurrent Collectors
OpenJDK Concurrent Collectors
 
Cache aware hybrid sorter
Cache aware hybrid sorterCache aware hybrid sorter
Cache aware hybrid sorter
 
Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) CollectorJava Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
 
Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a cluster
 
New Algorithms in Java
New Algorithms in JavaNew Algorithms in Java
New Algorithms in Java
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differences
 
Vectorization in ATLAS
Vectorization in ATLASVectorization in ATLAS
Vectorization in ATLAS
 
G1 collector and tuning and Cassandra
G1 collector and tuning and CassandraG1 collector and tuning and Cassandra
G1 collector and tuning and Cassandra
 
Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero Dawn
 
Single instruction multiple data
Single instruction multiple dataSingle instruction multiple data
Single instruction multiple data
 
Gunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdfGunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdf
 
JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011JVM @ Taobao - QCon Hangzhou 2011
JVM @ Taobao - QCon Hangzhou 2011
 
Scale Relational Database with NewSQL
Scale Relational Database with NewSQLScale Relational Database with NewSQL
Scale Relational Database with NewSQL
 
Tc basics
Tc basicsTc basics
Tc basics
 
Tips and tricks for building high performance android apps using native code
Tips and tricks for building high performance android apps using native codeTips and tricks for building high performance android apps using native code
Tips and tricks for building high performance android apps using native code
 
Tuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesTuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issues
 

More from Jean-Philippe BEMPEL

Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Jean-Philippe BEMPEL
 
Tools in action jdk mission control and flight recorder
Tools in action  jdk mission control and flight recorderTools in action  jdk mission control and flight recorder
Tools in action jdk mission control and flight recorderJean-Philippe BEMPEL
 
Out ofmemoryerror what is the cost of java objects
Out ofmemoryerror  what is the cost of java objectsOut ofmemoryerror  what is the cost of java objects
Out ofmemoryerror what is the cost of java objectsJean-Philippe BEMPEL
 
OutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en javaOutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en javaJean-Philippe BEMPEL
 
Low latency & mechanical sympathy issues and solutions
Low latency & mechanical sympathy  issues and solutionsLow latency & mechanical sympathy  issues and solutions
Low latency & mechanical sympathy issues and solutionsJean-Philippe BEMPEL
 
Lock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx ukLock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx ukJean-Philippe BEMPEL
 
Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)Jean-Philippe BEMPEL
 
Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)Jean-Philippe BEMPEL
 
Measuring directly from cpu hardware performance counters
Measuring directly from cpu  hardware performance countersMeasuring directly from cpu  hardware performance counters
Measuring directly from cpu hardware performance countersJean-Philippe BEMPEL
 
Devoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perfDevoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perfJean-Philippe BEMPEL
 

More from Jean-Philippe BEMPEL (13)

Mastering GC.pdf
Mastering GC.pdfMastering GC.pdf
Mastering GC.pdf
 
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
 
Tools in action jdk mission control and flight recorder
Tools in action  jdk mission control and flight recorderTools in action  jdk mission control and flight recorder
Tools in action jdk mission control and flight recorder
 
Le guide de dépannage de la jvm
Le guide de dépannage de la jvmLe guide de dépannage de la jvm
Le guide de dépannage de la jvm
 
Out ofmemoryerror what is the cost of java objects
Out ofmemoryerror  what is the cost of java objectsOut ofmemoryerror  what is the cost of java objects
Out ofmemoryerror what is the cost of java objects
 
OutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en javaOutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en java
 
Low latency & mechanical sympathy issues and solutions
Low latency & mechanical sympathy  issues and solutionsLow latency & mechanical sympathy  issues and solutions
Low latency & mechanical sympathy issues and solutions
 
Lock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx ukLock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx uk
 
Lock free programming- pro tips
Lock free programming- pro tipsLock free programming- pro tips
Lock free programming- pro tips
 
Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)
 
Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)
 
Measuring directly from cpu hardware performance counters
Measuring directly from cpu  hardware performance countersMeasuring directly from cpu  hardware performance counters
Measuring directly from cpu hardware performance counters
 
Devoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perfDevoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perf
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Understanding jvm gc advanced

  • 2. 2 • • GC basics • G1 • Shenandoah • Azul’s C4 • ZGC • How to choose a GC algorithm? Understanding JVM GC: Advanced!
  • 5. 5 • • Traversing references to mark live objects • Stopping when reaching old generation • From GC roots (static fields, thread stack, JNI) Marking for Minor GC Young Old
  • 6. 6 • Card Table for references old -> young references Write barrier to update card table on assignation X.f = Y Card Table Young 0 0 1 CARD_TABLE[&X >> 9] = 1 mov DWORD PTR [r10+0x6c],r8d mov r11,r10 shr r11,0x9 mov r8d,0x2383000 mov BYTE PTR [r8+r11*1],r12b
  • 7. G1
  • 8. 8 • • Generational • Region based • Pause time target (soft real-time) • -XX:MaxGCPauseMillis=n (default 200) • Default GC since JDK9 Garbage First
  • 9. 9 • Heap divided into fixed-size regions Regions
  • 11. 11 • • Young collection (STW) • Initial Mark (STW) • Concurrent Marking • Final Remark (STW) • Cleanup (STW) • Mixed collection (STW) G1 phases
  • 12. 12 • • Stop-The-World event • Evacuates live objects to Survivor or Old regions • Only objects in young generation are considered Young GC
  • 13. 13 • • Card table per region • Avoid scanning the entire heap Remembered Sets
  • 14. 14 • • For each reference assignation (X.f = Y) we need to check: • References (X & Y) are NOT in the same region • Y is not null • => enqueue for Remebered Set processing • Refinement threads to process the queue • Additional instructions added after assignation Remembered Sets: Post Write Barrier if (!isInSameRegion(X, Y) && Y != null) RSEnqueue(X) mov DWORD PTR [rbp+0x74],r10d mov r11,rbp mov r8,r10 shl r8,0x3 xor r8,r11 shr r8,0x14 test r8,r8 je cont test r10d,r10d je cont shr r11,0x9 movabs rcx,0x2965ecc3000 add rcx,r11 cmp BYTE PTR [rcx],0x20 je cont mov r10,QWORD PTR [r15+0x70] mov r11,QWORD PTR [r15+0x80] lock add DWORD PTR [rsp-0x40],0x0 cmp BYTE PTR [rcx],0x0 je cont mov BYTE PTR [rcx],0x0 test r10,r10 jne 0x000002965edc62bc mov rdx,r15 movabs r10,0x7ffac2febc30 call r10 jmp cont mov QWORD PTR [r11+r10*1-0x8],rcx add r10,0xfffffffffffffff8 mov QWORD PTR [r15+0x70],r10
  • 15. 15 • • Triggered based on Initiating Heap Occupancy Percent flag (IHOP default to 45%) • Try to mark the whole object graph concurrently with the application running • Based on Tri-color abstraction & Snapshot-At-The-Beginning algorithm Concurrent Marking
  • 16. 16 • Concurrent Marking: Tri-Color Abstraction
  • 17. 17 • Concurrent Marking: Issues • New allocations during marking phase can be handled by: • Marking automatically object at allocation • Not considering new allocations for the current cycle • Tri-Color abstraction provides 2 properties of missed object: 1. The mutator stores a reference to a white object into a black object. 2. All paths from any gray objects to that white object are destroyed. http://www.memorymanagement.org/glossary/s.html#term-snapshot-at-the-beginning
  • 18. 18 • Concurrent Marking: Issues A B C A.field1 = C; B.field2 = null; OOPS!
  • 19. 19 • • 2 ways to ensure not missing any marking • For SATB, Pre-Write Barriers, recording object for marking • SATB barrier is only active when Marking is on (global state) Concurrent Marking: Resolving misses if (SATB_WriteBarrier) { if (X.f != null) SATB_enqueue(X.f); } cmp BYTE PTR [r15+0x30],0x0 jne 0x000002965edc62e5 [...] mov r11d,DWORD PTR [rbp+0x74] test r11d,r11d je 0x000002965edc6253 mov r10,QWORD PTR [r15+0x38] mov rcx,r11 shl rcx,0x3 test r10,r10 je 0x000002965edc6318 mov r11,QWORD PTR [r15+0x48] mov QWORD PTR [r11+r10*1-0x8],rcx add r10,0xfffffffffffffff8 mov QWORD PTR [r15+0x38],r10 jmp 0x000002965edc6253 mov rdx,r15 movabs r10,0x7ffac2febc50 call r10 jmp 0x000002965edc6253
  • 20. 20 • • At the end of Marking, we have per region liveness information • Regions are sorted by liveness (ascending) • Regions full of garbage are collected during cleanup STW phase • CollectionSet is built based on • Liveness, up until thresholds (G1HeapWastePercent, G1MixedGCLiveThresholdPercent) • Maximum number of regions (G1OldCSetRegionThresholdPercent) CollectionSet
  • 21. 21 • • Based on CollectionSet, G1 schedule to collect part of old regions • When a Young is triggered, old regions to collect are piggy backed • Not all old regions are considered to not waste time and reach the pause goal • Several Young GCs can be used to collect old regions (mixed event) Mixed GC
  • 23. 23 • • Still fallback to FullGC (serial < JDK10) • Fragmentation can still happen (regions with lot of lived objects) • Still unpredictable FullGC
  • 25. 25 • • Non-generational (still option for partial collection) • Region based • Use Read Barrier: Brooks pointer • Self-Healing • Cooperation between mutator threads & GC threads • Only for concurrent compaction • Mostly based on G1 but with concurrent compaction Shenandoah GC
  • 26. 26 • • Initial Marking (STW) • Concurrent Marking • Final Remark (STW) • Concurrent Cleanup • Concurrent Evacuation • Init Update References (STW) • Concurrent Update References • Final Update References (STW) • Concurrent Cleanup Shenandoah Phases
  • 27. 27 • • SATB-style (like G1) • 2 STW pauses for Initial Mark & Final Remark • Conditional Write Barrier • To deal with concurrent modification of object graph Concurrent Marking
  • 28. 28 • • Same principle than G1: • Build CollectionSet with Garbage First! • Evacuate to new regions to release the region for reuse • Concurrent Evacuation done with the help of: • 1 Read Barrier : Brooks pointer • 4 Write Barriers • Barriers help to keep the to-space invariant: • All Writes are made into an object in to-space Concurrent Evacuation
  • 29. 29 • • All objects have an additional forwarding pointer • Placed before the regular object • Dereference the forwarding pointer for each access • Memory footprint overhead • Throughput overhead Brooks pointers Header Brooks pointer mov r13,QWORD PTR [r12+r14*8-0x8]
  • 30. 30 • Concurrent Copy: GC thread Header Brooks pointer Header Brooks pointer From-Space To-Space GC thread
  • 31. 31 • Concurrent Copy: Reader threads Header Brooks pointer From-Space To-Space Reader thread Reader thread
  • 32. 32 • Concurrent Copy: Writer threads Header Brooks pointer Header Brooks pointer From-Space To-Space Writer thread Writer thread Header Brooks pointer
  • 33. 33 • • Any writes (even primitives) to from-space object needs to be protected • Exotic barriers: • acmp (pointer comparison) • CAS • clone Write Barriers if (evacInProgress && inCollectionSet(obj) && notCopyYet(obj)) { evacuateObject(obj) } test BYTE PTR [r15+0x3c0],0x2 jne 0x000000000281bcbc [...] mov r10d,DWORD PTR [r13+0xc] test r10d,r10d je 0x000000000281bc2b mov r11,QWORD PTR [r15+0x360] mov rcx,r10 shl rcx,0x3 test r11,r11 je 0x000000000281bd0d [...] mov rdx,r15 movabs r10,0x62d1f660 call r10 jmp 0x000000000281bc2b
  • 34. 34 • • Late memory release • Only happens when all refs updated (Concurrent Cleanup phase) • Allocations can overrun the GC • Failure modes: • Pacing • Degenerated GC • FullGC Extreme cases
  • 36. 36 • • Generational (young & old) • Region based (pages) • Use Read Barrier: Loaded Value Barrier • Self-Healing • Cooperation between mutator threads & GC threads • Pauseless algorithm but implementation requires safepoints • Pauses are most of the time < 1ms Continuously Concurrent Compacting Collector
  • 37. 37 • • Baker-style Barrier • move objects through forwarding addresses stored aside • Applied at load time, not when dereferencing • Ensure C4 invariants: • Marked Through the current cycle • Not relocated • If not => Self-healing process to correct it • Mark object • Relocate & correct reference • Checked for each reference loads • Benefits from JIT optimization for caching loaded value (registers) LVB
  • 38. 38 • • States of objects stored inside reference address => Colored pointers • NMT bit • Generation • Checked against a global expected value during the GC cycle • Thread local, almost always L1 cache hits • Register • Relocated: x86 Implementation use trap from VM memory translation Guest/Host • Intel EPT • AMD NPT LVB test r9, rax jne 0x3001443b mov r10d, dword ptr [rax + 8]
  • 39. 39 • Virtual Memory vs Physical Memory Virtual Memory Physical Memory 0 2^64 0 2^37
  • 40. 40 • • All phases are fully parallel & concurrent • No "rush" to finish phases • No constraint about STW pause to be short • Physical memory released quickly in relocation phase • Can be reused for new allocations • Plenty of virtual space vs physical memory C4 Phases
  • 41. 41 • • Mark • Marking all objects in graph • Relocation • Moving objects to release pages • Remap • Fixup references in object graph • Folded with next mark cycle C4 Phases
  • 42. 42 • • Incremental Update Marking (vs SATB) • Single pass • No final mark/remark • Self-Healing: Mark object that are not marked for the current cycle Mark Phase
  • 43. 43 • Mark Phase: Concurrent Modification A B C A.field1 = C; B.field2 = null; LVB
  • 44. 44 • • Scanning roots (Static var, Thread stacks, register, JNI handles) • GC threads scans stalled threads • Running threads scans their own stack stopping individually at Safepoint • Scanning object graph like a parallel collector • Newly allocated objects into new pages, not considered for reclaim (relocation) • For each page, summing live data bytes, used to select page to reclaim Mark Phase
  • 45. 45 • • Select pages with the greatest number of dead objects (garbage first!) • Protect page selected from being accessed by mutators thread • Move objects to new allocated pages • Build side arrays (off heap tables) for forwarding information • Self-Healing: As protected, LVB will trigger a trap to: • Copy object to the new location if not done • Use forward pointer to fix the reference Relocation Phase
  • 47. 47 • • Few chances mutators stall on accessing a ref as processing mostly dead pages • Once object copy done, physical memory is released (Quick Release) • Can be immediately reused (remapped) to satisfy new allocations • Pages evacuated are still mapped & protected to help remap phase • Cannot be released until all objects are remapped • Not a problem as we have a huge virtual address space Relocation Phase
  • 48. 48 • • Traverse Object Graph and fixup references • Execute LVB barrier for each object • Self-Healing: fixup references using forward information • As we traverse again, mark for the next phase • Mark & Remap phases are folded! Remap Phase
  • 49. 49 • • Algorithm requires a sustainable rate or remapping operations • Linux limitations: • TLB invalidation • Only 4KB pages can be remapped • Single threaded remapping (write lock) • Kernel module implements API for the Zing JVM to increase significantly the remapping rate • Implements also virtual address aliasing for addressing objects with metadata Remap – Kernel module
  • 50. 50 • • Young & Old collections done by same algorithm and can be concurrent • Size of the generation are dynamically adjusted • Card Marking with write barrier (Stored Value Barrier) • Old collection is based on young-to-old roots generated by previous young cycle • Young collection will perform card scanning per page • hold an eventual concurrent Old collection per page scanned Generational
  • 51. 51 • • Used by Hadoop Name Node • 580GB Heap • Very hard to tune with G1 • No issue so far regarding GC since production roll out (Oct 2017) C4 @ Criteo
  • 52. Z GC
  • 53. 53 • • Non generational • Region based (zPages, dynamically sized) • Concurrent Marking, Compaction, Ref processing • Use Colored Pointers & Read/Load Barrier • Self-Healing • Cooperation between mutator threads & GC threads • Experimental in JDK 11 (-XX:+UnlockExperimentalVMOptions –XX:+UseZGC) Z GC mov r10,QWORD PTR [r11+0xb0] test QWORD PTR [r15+0x20],r10 jne 0x00007f9594cc54b5
  • 55. 55 • • Initial Mark (STW) • Concurrent Mark/Remap • Final Mark (STW) • Concurrent Prepare for Relocation • Start Relocate (STW) • Concurrent Relocate Z GC phases:
  • 56. 56 • • Store metadata in unused bits of reference address • 42 bits for addressing (4TB) • 4 bits for metadata • Marked0 • Marked1 • Remapped • Finalizable Colored Pointers
  • 57. 57 • • Colored pointers needs to be unmasked for dereferencing • Some HW support masking (SPARC, Aarch64)) • On linux/windows, overhead if done with classical instructions • Only one view is active at any point • Plenty of Virtual Space Multi-Mapping
  • 58. 58 • Multi-Mapping Virtual Memory Physical Memory 0 2^64 0 2^37 (marked0) 001<address> (marked1) 010<address> (remapped) 100<address>
  • 59. 59 • • Pages are multiple of 2MB • 3 different groups • Small: 2MB pages with object size <= 256KB • Medium: 32MB pages with object size <= 4MB • Large: 2MB pages, objects span over multiple of them • Objects in Large group are meant to not to be relocated (too expensive) Page Allocations
  • 60. 60 • • Handling remapping • C4: Memory protection + trap • Z: mask in colored pointer • Unmasking ref addresses • C4: Kernel module aliasing • Z: Multi-mapping or HW support • Pages & Relocation • C4: • Page are fixed to match OS size (mem protection) • relocation for large objects by remapping • Z: • zPages are dynamic, a zPage can be 100MB large • No relocation for large objects Difference between C4 & Z GC
  • 61. How to choose a GC algorithm
  • 62. 62 • • Case 1: • Need maximum of work done in a time frame (offline job) • Can afford FullGC of several seconds  Use a throughput collector like ParalleGC or G1 • Case 2: • Have time constraint per unit of work (online job) • Cannot afford FullGC of several seconds  Use a low latency collector like C4, Shenandoah or Z Throughput vs Latency
  • 63. 63 • • You have to run on Windows • Shenandoah • Battlefield tested GC (maturity) • C4 • Shenandoah • Minimizing any kind of JVM pauses • C4 • Z • You don’t want pay for it: • Shenandoah • Z Low latency GCs
  • 65. 65 • • Java Garbage Collection distilled by Martin Thompson • The Java GC mini book • Oracle’s white paper on JVM memory management & GC • What differences JVM makes by Nitsan Wakart • Memory Management Reference • IBM Pause-Less GC References GC Basics
  • 66. 66 • • Garbage-First Garbage Collection (2004) • G1 One Garbage Collector to rule them all by Monica Beckwith • Tips for Tuning The G1 GC by Monica Beckwith • G1 Garbage Collector Details and Tuning by Simone Bordet • Write Barriers in Garbage-First Garbage Collector by Monica Beckwith References G1
  • 67. 67 • • Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK • Shenandoah: The Garbage Collector That Could by Aleksey Shipilev • Shenandoah GC Wiki References Shenandoah
  • 68. 68 • • The Pauseless GC algorithm (2005) • C4: Continuously Concurrent Compacting Collector (2011) • Azul GC in Detail by Charles Humble • 2010 version source code References C4
  • 69. 69 • • ZGC - Low Latency GC for OpenJDK by Per Liden • Java's new Z Garbage Collector (ZGC) is very exciting by Richard Warburton • A first look into ZGC by Dominik Inführ • Architectural Comparison with C4/Pauseless References ZGC