Java Virtual Machine with 5K Pay Up: Garbage Collection, Serial GC, Card Table Do you want to learn?

Posted by brauchii on Mon, 07 Mar 2022 18:37:16 +0100

Serial GC

Article Starting Public Number: Java Architect Consortium, Daily Update of Technical Good Text

Weak Generation Hypothesis

Serial GC is the most classic and oldest garbage collector, opened with -XX:+UseSerialGC. Its Java heap conforms to the Weak Generational Hypothesis. The implication of the weak generation hypothesis is that most objects die young, and it has been validated in a variety of programming paradigms or programming languages. In contrast, the strong generation hypothesis implies that older objects are less likely to die, but there is little evidence to support it. It has been noted that most objects have very short life cycles, which is cognizant because allocating local variables is almost the most common operation in a method, and many of these objects are discarded afterwards.

Based on the weak generation hypothesis, virtual machines implement a generation heap model that divides Java heaps into older generations with larger space and YoungGeneration with smaller space. Among them, the new generation accommodates new objects that are dying and dying, garbage collection occurs more frequently in this area, and the older generation accommodates objects with a longer life cycle. It can be simply considered that objects that survive multiple garbage collection have a longer life cycle. Older generations grow slowly, so garbage collection occurs less frequently. Such a heap is called a generation heap. In the generation heap model, GC no longer works for the entire heap, but for "exclusive generation". Young GC (hereinafter referred to as YGC) only recycles the new generation and Full GC (hereinafter referred to as FGC) recycles the entire heap. The emergence of YGC eliminates the need for GC to traverse the entire heap to find survivors and reduces the frequency of recycling in older generations.

Generation heaps benefit from the distinction of object life cycles, but they are also protected by it. Previously, all survivors were found by simply traversing the entire heap, but after generation, it was not possible to simply traverse a single generation because there might be references from older generations to newer generations, that is, cross-generation references. If you only traverse the new generation, you may mistakenly mark some objects that were originally referenced and then kill them. While the principle of garbage collection is "Better miss the killer than miss the killer". It is absolutely impossible to mistakenly clean up surviving objects. The problem now is that the new generation objects will be referenced by the older generation in addition to the GC Root. If you want to traverse the older generation and the GC Root with more space to find the surviving objects of the new generation, you will lose the advantage of the generation and lose the money.

Cross-generation references are a problem that all generational garbage collectors must face. In order to handle cross-generation references, a data structure called Remember Set (RSet) is needed to record references from older generations to new generations. Another problem is that many of the objects in the old generation may have actually died. If the objects in the old generation are not cleaned up in time, GCRoot and the objects that have died in the old generation will be used as roots to search for survivors when the new generation recycles. The new generation objects that caused this death will also be marked as survivors, resulting in floating garbage. In extreme cases, floating garbage can offset the benefits of stack generation.

Memory sets imply that the GC has the ability to discover each write object operation. Whenever an object write operation occurs, the GC checks whether the written object is in a different generation and decides whether to put it in the memory set. The components that give the GC the ability to "discover all write object operations" are the GC barriers, specifically in the context. Writing objects is part of the code execution system and is accomplished by the GC barrier in conjunction with the JIT compiler and template interpreter.

In Serial GC, FGC traverses the entire heap without considering cross-generation references. YGC only occurs in the new generation and needs to deal with cross-generation references. Serial GC uses a coarse-grained memory set called a card table, which is described in detail below.

Card table

Card Table is a coarse-grained memory set that can store cross-generation references. Instead of accurately recording objects and references pointing to the new generation in the older generation, it divides the older generation into two power-sized memory pages, recording the pages in which they are located. Mapping these pages using a card table reduces the memory overhead of the memory set itself, while avoiding the traversal of the entire older generation. A standard card table implementation is usually a bitmap with one memory page for each bit. As shown in Figure 10-2.

Figure 10-2 Card Table

When the Mutator thread performs a class member variable assignment operation, the virtual opportunity checks whether an older generation object or reference is assigned to a new generation member, and if so, marks the bits in the card table corresponding to the memory page in which the member variable resides. Subsequently, only the memory pages corresponding to the bits marked in the card table need to be traversed, instead of the entire older generation.

However, bitmaps can be quite slow, because when one of the bit tags in a bitmap is read, updated, and written back, several instructions are required to perform a bit operation on the RISC processor. An effective performance improvement is to use byte arrays instead of bitmaps, which use eight times as much memory as bitmaps, but still account for less than 1% of the heap. The card table for HotSpot VM is implemented by CardTable, which uses a byte array instead of a bitmap, CardTable::byte_ The for function is responsible for mapping the memory address to the card table byte array, as shown in code lists 10-7:

Code Listing 10-7 CardTable::byte_for

jbyte* CardTable::byte_for(const void* p) const {
jbyte* result = &_byte_map_base[uintptr_t(p) >> card_shift];
return result;

Where card_shift is 9. It is not difficult to see from the implementation that the virtual machine defines a memory page as 512 bytes, which is byte_whenever there is a cross-generation reference to a memory page Map_ The item corresponding to the base array is marked dirty.

Young GC

Serial GC names the new generation DefNewGeneration and the older generation TenuredGeneration. DefNewGeneration divides the new generation into Eden space and Survivor space, while Survivor space can be further divided into From and To space, as shown in Fig. 10-3.

Figure 10-3 Serial GC Neogenesis Details

YGC uses a replication algorithm to clean up new generation space. A common scenario for YGC is that small objects are initially allocated in Eden space, when YGC occurs when Eden space is insufficient, the living objects in Eden space and From space are marked, and then the virtual machine transfers the living objects in both spaces to ToSpace, or to older generations if ToSpace cannot accommodate objects. If the To space can accommodate objects, the Eden and From spaces are empty, the From space and the To space exchange roles, then there is an empty Eden space, the From space of some surviving objects, and the empty To space. Repeat these steps the next time YGC occurs.

When some objects are still alive after many YGCs, it can be thought that they have a long life cycle and do not belong to the objects that are dying or dying. Therefore, the GC will promote the objects from the new generation to the old generation. In addition to the general situations mentioned above, there are special circumstances to consider, such as the inability of Eden space to accommodate large objects at first, and the inability of older generations to accommodate promoted objects. The implementation of the full YGC logic is shown in code lists 10-8, which also includes special emptying:

Code Listing 10-8 DefNewGeneration::collect

void DefNewGeneration::collect(...) {...
if (!collection_attempt_is_safe()) {// Check older generations for promotions
FastScanClosure fsc_with_no_gc_barrier(...);
FastScanClosure fsc_with_gc_barrier(...);
CLDScanClosure cld_scan_closure(...);
FastEvacuateFollowersClosure evacuate_followers(...);
{ // Scanning survivors from GC Root
StrongRootsScope srs(0);
heap->young_process_roots(&srs, &fsc_with_no_gc_barrier,
&fsc_with_gc_barrier, &cld_scan_closure);
evacuate_followers.do_void();// Handle non-GC Root direct, member field reachable objects
... // Special handling of soft, weak, virtual, final references
// If promotion is possible, empty Eden, From space; Swap From, To spaces; Adjusting the promotion threshold for older generations
if (!_promotion_failed) {
} else {
// Otherwise notify older generations that promotion fails and still swap From and To Space

Before YGC is made, check that the garbage collection is safe (collection_attempt_is_safe). The so-called safety is to determine whether the older generation can safely accommodate the new generation in the worst case when it is all the living objects that need promotion. If you can continue to do YGC.

young_process_roots() scans all types of GC Root s and scans the card table memory set To find references from older generations To newer generations, then copies them To the ToSpace using a fast scan closure. FastScanClosure, a fast scan closure, abstracts an operation on an object (threads, objects, klass, and so on) into a closure operation, which is then passed To the logical code that handles continuous objects. Since the C++ 98 language standard used by HotSpot VM does not have a lambda expression, only classes can be used To simulate closures [1]. The FastScanClosure closure is shown in code listings 10-9:

Code Listing 10-9 FastScanClosure Closure Closure

template <class T> inline void FastScanClosure::do_oop_work(T* p) {
// Get object from address p
T heap_oop = RawAccess<>::oop_load(p);
if (!CompressedOops::is_null(heap_oop)) {
oop obj = CompressedOops::decode_not_null(heap_oop);
// If the object is in a new generation
if ((HeapWord*)obj < _boundary) {
// If the object has a forward pointer, which is equivalent to being copied, then the copied object can be used directly, otherwise
// OOP new_needs to be copied Obj = obj->is_ Forwarded()
?obj->forwardee(): _g->copy_to_survivor_space(obj);
RawAccess<IS_NOT_NULL>::oop_store(p, new_obj);
if (is_scanning_a_cld()) { // Set gc_as appropriate Barrier
} else if (_gc_barrier) {

Starting with GC Root and the old age, all objects that can be reached are living objects, and FastScanClosure is applied to each living object. If you encounter an object that already has a forwarding pointer set, that is, it has been copied, return the copied object directly, otherwise use copy_as shown in code lists 10-10 To_ Survivor_ Space for replication:

Code Listings 10-10 copy_to_survivor_space

oop DefNewGeneration::copy_to_survivor_space(oop old) {
size_t s = old->size();
oop obj = NULL;
// Allocating objects in To space
if (old->age() < tenuring_threshold()) {
obj = (oop) to()->allocate_aligned(s);}
// To Space allocation failed, allocated in older generations
if (obj == NULL) {
obj = _old_gen->promote(old, s);
if (obj == NULL) {
return old;
} else {
// To Space Allocation Successful
const intx interval = PrefetchCopyIntervalInBytes;
Prefetch::write(obj, interval); // Prefetch to Cache
// Copy Object To ToSpace
// Object Age Increase
age_table()->add(obj, s);
// Insert forward pointer in object header (replace previous object address with new object address, and set object header GC bit)
return obj;

copy_to_survivor_space() Copies an object to To space or promotes it to an older generation, as appropriate, and then sets a new object address for the old object to forward the pointer (Forwarding Pointer). The meaning of setting forward pointer is that there may be two slots pointing to the same object in GC Root. If you simply move the object and change the slot to a new object address, the second GC Root slot will access the wrong old object address. After setting forward pointer, subsequent access to the old object will be forwarded to the correct new object.

The above process touches the GC Root and older directly reachable objects and moves them To the ToSpace (or promotes the older generation), which may contain reference fields and may indirectly reach other objects. Serial GC maintains a save_mark pointer and top pointer of allocated space, bottom of To space To save_ Objects in the mark's area represent objects that have been scanned by themselves and their fields, save_ Objects in the area marking To the top of the space represent objects whose scans are complete but whose fields are incomplete.

The task of FastEvacuateFollowersClosure is to scan save_mark objects to the top of space, traverse their fields, and move those reachable objects to the bottom of space to save_mark's region, then move forward save_mark until save_mark equals the top of the space, and the scan is complete.

The same logic applies To older generations as new generations may move To Tospace or promote To older generations.

Full GC

For historical reasons, the FGC implementation is in serial/genMarkSweep. Although the FGC implementation of SerialGC appears by name to be based on a tag cleanup algorithm, FGC is actually based on a tag compression algorithm, as shown in Fig. 10-4.

Figure 10-4 Serial GC

The tag collation algorithm used by FGC is based on the Lisp2 algorithm proposed by Donald E. Knuth: Mark the surviving objects first, then move all the surviving objects to one end of space. FGC starts with TenuredGeneration::collect, which records logs before and after GC and can be output using -Xlog:gc*, as shown in code listing 10-11:

Code List 10-11 FGC Log

GC(1) Phase 1: Mark live objects
GC(1) Phase 2: Compute new object addresses
GC(1) Phase 3: Adjust pointers
GC(1) Phase 4: Move objects

The log shows that the FGC process is divided into four phases, as shown in Fig. 10-5.

Fig.10-5 Four stages of FGC in Serial GC

1. Mark Live Object

The first phase of the virtual machine traverses all types of GC Roots and then uses XX::oops_do(root_closure) marks all living objects from this GC Root. XX denotes the GC Root type, root_ A closure represents a closure that marks a living object. Root_ The closure, MarkSweep::FollowRootClosure closure, gives it an object to mark the object, members of the iteratively marked object, and all objects and their members of the stack on which the marked object resides, as shown in code lists 10-12:

Code Lists 10-12 Mark Survivable Objects

template <class T> inline void MarkSweep::follow_root(T* p) {
// If the reference points to an object that is not empty and not marked
T heap_oop = RawAccess<>::oop_load(p);
if (!CompressedOops::is_null(heap_oop)) {
oop obj = CompressedOops::decode_not_null(heap_oop);
if (!obj->mark_raw()->is_marked()) {
mark_object(obj); // Tag Object
follow_object(obj); // Members of Tagged Objects
follow_stack(); // Stack where markup reference is located
// Mark the array if the object is an array object, otherwise mark the member inline void MarkSweep::follow_object(oop obj) {
if (obj->is_objArray()) {
} else {
void MarkSweep::follow_stack() { // Whole stack where markup references are located
do {
// Mark one by one if the stack to be marked is not empty
while (!_marking_stack.is_empty()) {
oop obj = _marking_stack.pop();
// If the object array stack is not empty, mark one by one
if (!_objarray_stack.is_empty()) {
ObjArrayTask task = _objarray_stack.pop();
follow_array_chunk(objArrayOop(task.obj()), task.index());
// Class and array members that mark the type of array, such as String[] p = new String[2]
// The P tag also marks java.lang.Class, p[1],p[2]
inline void MarkSweep::follow_array(objArrayOop array) {
if (array->length() > 0) {
MarkSweep::push_objarray(array, 0);}

2. Compute New Object Address

After marking all the surviving objects, Serial GC calculates a new address for the surviving objects and stores it in the object header in preparation for the next object collation (Compact). The idea of calculating a new address for an object is to set cur_first Obj and compact_top points to the bottom of the space and then starts scanning from the bottom of the space if cur_ If obj scans a living object, set the new address of the object to compact_top, then continue scanning and repeat until cur_obj reaches the top of the space.

3. Adjust Object Pointer

Although the new object address is computed, the GC Root still points to the old object, and the object member refers to the old object address. At this time, by adjusting the object pointer, you can modify these pointing relationships so that the GC Root points to the new object address, and then the reference of the object member is adjusted to refer to the new object address accordingly.

4. Move object

When everything is ready, memory is allocated to the object at the new address, and the reference relationship has been modified, but the object at the new address does not contain valid data, so the object data is copied from the old object address to the new object address one by one until the FGC completes. Serial GC resets GC-related data structures and logs GC information.

stopping the world

As mentioned in section 10.1.2, World Stop (STW) is the phenomenon where all Mutator threads pause. Both YGC and FGC of Serial GC use single threads, so all Mutator threads must be paused while GC is working. The larger the Java heap, the more obvious the STW is. Long STWs are not acceptable for GUI programs or other programs that require pseudo-real-time, fast response. So STW is one of the most criticized places in garbage collection technology: on the one hand, it takes time for all Mutator threads to reach a safe point, on the other hand, it also takes a lot of time for garbage collection itself after STW. So can we use modern processor multicores to parallelize some of the work in garbage collection after STW? Parallel GC gives a satisfactory answer to this question.