Java magic class: Unsafe application analysis

Posted by crucialnet on Sun, 20 Feb 2022 01:38:04 +0100

catalogue

 

preface

Basic introduction

Function introduction

Memory operation

CAS related

Thread scheduling

Class correlation

Object operation

Array correlation

Memory barrier

System related

epilogue

reference material

Introduction to the author

preface

Unsafe is located in sun A class under misc package mainly provides some methods for performing low-level and unsafe operations, such as direct access to system memory resources, self-management of memory resources, etc. these methods have played a great role in improving Java operation efficiency and enhancing the operation ability of Java language underlying resources. However, the unsafe class enables the Java language to operate the memory space like the C language pointer, which undoubtedly increases the risk of pointer related problems in the program. Excessive and incorrect use of unsafe classes in programs will increase the probability of program errors and make Java, a safe language, no longer "safe". Therefore, the use of unsafe must be cautious.

Note: This article is about sun misc. Unsafe public API functions and related application scenarios are introduced.

Basic introduction

As shown in the Unsafe source code below, the Unsafe class is a single instance implementation, which provides a static method getUnsafe to obtain the Unsafe instance. It is legal only when and only when the class calling the getUnsafe method is loaded by the boot class loader. Otherwise, a SecurityException exception is thrown.

public final class Unsafe {
  // Singleton object
  private static final Unsafe theUnsafe;

  private Unsafe() {
  }
  @CallerSensitive
  public static Unsafe getUnsafe() {
    Class var0 = Reflection.getCallerClass();
    // It is only legal when the bootstrap classloader 'bootstrap classloader' loads
    if(!VM.isSystemDomainLoader(var0.getClassLoader())) {    
      throw new SecurityException("Unsafe");
    } else {
      return theUnsafe;
    }
  }
}

If you want to use this class, how do you get its instance? There are two feasible schemes as follows.

First, starting from the use restrictions of getUnsafe method, add the jar package path of class a calling Unsafe related methods to the default bootstrap path through the Java command line command - Xbootclasspath/a, so that a is loaded by the boot class loader and passes Unsafe The getUnsafe method safely obtains an Unsafe instance.

java -Xbootclasspath/a: ${path}   // Where path is the jar package path of the class that calls Unsafe related methods 

Second, obtain the singleton object theUnsafe through reflection.

private static Unsafe reflectGetUnsafe() {
    try {
      Field field = Unsafe.class.getDeclaredField("theUnsafe");
      field.setAccessible(true);
      return (Unsafe) field.get(null);
    } catch (Exception e) {
      log.error(e.getMessage(), e);
      return null;
    }
}

Function introduction

As shown in the figure above, the API s provided by Unsafe can be roughly divided into memory operation, CAS, Class related, object operation, thread scheduling, system information acquisition, memory barrier, array operation, etc. the relevant methods and application scenarios will be introduced in detail below.

Memory operation

This part mainly includes the methods of allocation, copy, release, given address value operation and so on.

//Allocate memory, equivalent to malloc function of C + +
public native long allocateMemory(long bytes);
//Extended memory
public native long reallocateMemory(long address, long bytes);
//Free memory
public native void freeMemory(long address);
//Sets a value in a given memory block
public native void setMemory(Object o, long offset, long bytes, byte value);
//Memory Copy 
public native void copyMemory(Object srcBase, long srcOffset, Object destBase, long destOffset, long bytes);
//Gets the given address value, ignoring the access restrictions of the modifier qualifier. Similar operations include: getInt, getDouble, getLong, getChar, etc
public native Object getObject(Object o, long offset);
//Set a value for a given address and ignore the access restrictions of the modifier qualifier. Similar operations include putInt,putDouble, putLong, putChar, etc
public native void putObject(Object o, long offset, Object x);
//Get the value of byte type of the given address (if and only if the memory address is allocated by allocateMemory, the result of this method is determined)
public native byte getByte(long address);
//Set the value of byte type for the given address (the result of this method is determined if and only if the memory address is allocated by allocateMemory)
public native void putByte(long address, byte x);

Generally, the objects we create in Java are in heap memory, which is the java process memory controlled by the JVM, and they follow the memory management mechanism of the JVM. The JVM will uniformly manage the heap memory by using the garbage collection mechanism. In contrast, out of heap memory exists in memory areas outside the control of the JVM. The operation of out of heap memory in Java depends on the native method of operating out of heap memory provided by Unsafe.

Reasons for using off heap memory

  • Improvement of garbage collection pause. Because the out of heap memory is directly managed by the operating system rather than the JVM, when we use out of heap memory, we can maintain a small scale of in heap memory. So as to reduce the impact of recovery pause on the application during GC.
  • Improve the performance of program I/O operations. Generally, in the process of I/O communication, there will be data copying from in heap memory to out of heap memory. It is recommended to store the temporary data that needs frequent inter memory data copying and has a short life cycle in out of heap memory.

Typical application

DirectByteBuffer is an important class used by Java to realize out of heap memory. It is usually used as a buffer pool in the communication process. For example, it is widely used in NIO frameworks such as Netty and MINA. The logic of DirectByteBuffer for creating, using and destroying off heap memory is implemented by the off heap memory API provided by Unsafe.

The following figure shows the DirectByteBuffer constructor. When creating DirectByteBuffer, use unsafe Allocatememory allocates memory, unsafe Setmemory initializes the memory, and then constructs a Cleaner object to track the garbage collection of the DirectByteBuffer object, so that when the DirectByteBuffer is garbage collected, the allocated out of heap memory is released together.

So how to free the off heap memory by building the garbage collection tracking object Cleaner?

Cleaner inherits from the virtual reference PhantomReference, one of the four major reference types of Java (as we all know, it is impossible to obtain the object instance associated with it through virtual reference, and when the object is only referenced by virtual reference, it can be recycled at any time of GC). Generally, PhantomReference is used in combination with the reference queue ReferenceQueue, It can realize the functions of system notification and resource cleaning when the virtual reference associated object is garbage collected. As shown in the following figure, when an object referenced by the cleaner will be recycled, the JVM garbage collector will put the reference of the object into the pending linked list in the object reference and wait for the reference handler to handle it. Among them, reference handler is a daemon thread with the highest priority, which will continuously process the object references in the pending linked list and execute the clean method of cleaner to clean up the related work.

Therefore, when DirectByteBuffer is only referenced by Cleaner (i.e. Virtual Reference), it can be recycled in any GC period. When the DirectByteBuffer instance object is recycled, in the reference handler thread operation, the clean method of the Cleaner will be called to release the out of heap memory according to the Deallocator passed in when creating the Cleaner.

CAS related

As shown in the following source code interpretation, this part mainly refers to the methods of CAS related operations.

/**
	*  CAS
  * @param o         Contains the object to modify the field
  * @param offset    The offset of a field in the object
  * @param expected  expected value
  * @param update    Update value
  * @return          true | false
  */
public final native boolean compareAndSwapObject(Object o, long offset,  Object expected, Object update);

public final native boolean compareAndSwapInt(Object o, long offset, int expected,int update);
  
public final native boolean compareAndSwapLong(Object o, long offset, long expected, long update);

What is CAS? That is to compare and replace, a technology commonly used in the implementation of concurrent algorithms. The CAS operation contains three operands -- memory location, expected original value, and new value. When performing CAS operation, compare the value of the memory location with the expected original value. If it matches, the processor will automatically update the location value to the new value. Otherwise, the processor will not do any operation. As we all know, CAS is an atomic instruction of CPU (cmpxchg instruction), which will not cause the so-called data inconsistency. The underlying implementation of CAS method (such as compareAndSwapXXX) provided by Unsafe is CPU instruction cmpxchg.

Typical application

CAS in Java util. concurrent. Atomic related classes, Java AQS, CurrentHashMap and other implementations are widely used. As shown in the figure below, in the implementation of AtomicInteger, the static field valueOffset is the memory offset address of the field value. When AtomicInteger is initialized, the value of valueOffset is obtained in the static code block through the objectFieldOffset method of Unsafe. In the thread safety method provided in AtomicInteger, the value of the field valueOffset can be located to the memory address of value in the AtomicInteger object, so that the atomic operation on the value field can be realized according to CAS.

The following figure shows the memory diagram of an AtomicInteger object before and after the autoincrement operation. The base address of the object is baseAddress = "0x110000", and the memory address of value is valueAddress = "0x11000c" obtained through baseAddress+valueOffset; Then the atomic update operation is carried out through CAS. If it is successful, it will be returned. Otherwise, continue to retry until the update is successful.

Thread scheduling

This part includes thread suspension, recovery, locking mechanism and other methods.

//Unblock thread
public native void unpark(Object thread);
//Blocking thread
public native void park(boolean isAbsolute, long time);
//Obtain object lock (reentrant lock)
@Deprecated
public native void monitorEnter(Object o);
//Release object lock
@Deprecated
public native void monitorExit(Object o);
//Attempt to acquire object lock
@Deprecated
public native boolean tryMonitorEnter(Object o);

In the above source code description, the methods Park and unpark can realize the suspension and recovery of threads. The suspension of a thread is realized through the park method. After calling the park method, the thread will be blocked until timeout or interruption occurs; Unpark can terminate a suspended thread and restore it to normal.

Typical application

AbstractQueuedSynchronizer, the core class of the Java lock and synchronizer framework, calls LockSupport park() and LockSupport Unpark() implements thread blocking and wake-up, while LockSupport's park and unpark methods are actually implemented by calling Unsafe's park and unpark methods.

Class correlation

This section mainly provides methods related to the operation of Class and its static fields, including static field memory location, Class definition, anonymous Class definition, check & ensure initialization, etc.

//Gets the memory address offset of the given static field, which is unique and fixed for the given field
public native long staticFieldOffset(Field f);
//Gets the object pointer of the given field in a static class
public native Object staticFieldBase(Field f);
//It is usually used to determine whether a class needs to be initialized when obtaining the static properties of a class (because if a class is not initialized, its static properties will not be initialized). Returns false if and only if the ensurecalassinitialized method does not take effect.
public native boolean shouldBeInitialized(Class<?> c);
//Checks whether the given class has been initialized. It is usually used when obtaining the static properties of a class (because if a class is not initialized, its static properties will not be initialized).
public native void ensureClassInitialized(Class<?> c);
//Define a class. This method will skip all security checks of the JVM. By default, ClassLoader and ProtectionDomain instances come from the caller
public native Class<?> defineClass(String name, byte[] b, int off, int len, ClassLoader loader, ProtectionDomain protectionDomain);
//Define an anonymous class
public native Class<?> defineAnonymousClass(Class<?> hostClass, byte[] data, Object[] cpPatches);

Typical application

Starting from Java 8, JDK uses invokedynamic and VM Anonymous Class to implement Lambda expressions at the Java language level.

  • Invokedynamic: invokedynamic is a new virtual machine instruction introduced by Java 7 in order to run the dynamic language on the JVM. It can dynamically resolve the method referenced by the call point qualifier at run time, and then execute the method. The dispatching logic of invokedynamic instruction is determined by the guiding method set by the user.
  • VM Anonymous Class: it can be regarded as a template mechanism. When a program dynamically generates many classes with the same structure and only several constants different, you can first create a template Class containing constant placeholders, and then use unsafe When the defineanonymousclass method defines a specific Class, fill in the placeholder of the template to generate a specific anonymous Class. The generated anonymous Class is not explicitly hung under any ClassLoader. As long as there is no instance object of the Class and there is no strong reference to the Class object of the Class, the Class will be recycled by GC. Therefore, compared with the anonymous internal classes at the Java language level, VM Anonymous Class does not need to be loaded through ClassLoader and is easier to recycle.

In the implementation of Lambda expression, the calling point is generated by calling the bootstrap method through the invokedynamic instruction. In this process, the bytecode is dynamically generated through ASM, and then the anonymous class that implements the corresponding functional interface is defined by using the defineAnonymousClass method of Unsafe, and then the anonymous class is instantiated, And return the call point associated with the method handle of the functional method in this anonymous class; Then, the function of calling the corresponding Lambda expression definition logic can be realized through this call point. The following is an example of the Test class shown in the following figure.

The decompiled results of the class file compiled by Test class are shown in Figure 1 below (the parts that are meaningless to this description are deleted). We can see the instruction implementation of main method, bootstrap methods called by invokedynamic instruction, and static method lambda$main lambda $main $0 (which implements the string printing logic in lambda expression), etc. During the execution of the boot method, it will pass unsafe Defineanonymousclass generates an anonymous class that implements the Consumer interface as shown in Figure 2 below. The accept method implements the logic defined in the lambda expression by calling the static method lambda$main lambda $main $0 in the Test class. Then execute the statement Consumer Accept ("lambda") actually calls the accept method of the anonymous class shown in Figure 2 below.

Object operation

This part mainly includes operations related to object member attributes and unconventional object instantiation methods.

//Returns the offset of the object member property from the memory address of the object
public native long objectFieldOffset(Field f);
//Get the value of the specified address offset of the given object. Similar operations include: getInt, getDouble, getLong, getChar, etc
public native Object getObject(Object o, long offset);
//The specified address offset setting value of a given object. Similar operations include putInt, putDouble, putLong, putChar, etc
public native void putObject(Object o, long offset, Object x);
//Get the reference of the variable from the specified offset of the object, and use the loading semantics of volatile
public native Object getObjectVolatile(Object o, long offset);
//Store the reference of the variable to the specified offset of the object, and use the storage semantics of volatile
public native void putObjectVolatile(Object o, long offset, Object x);
//The orderly and delayed version of putObjectVolatile method does not guarantee that the change of value will be seen by other threads immediately. Valid only if the field is decorated with the volatile modifier
public native void putOrderedObject(Object o, long offset, Object x);
//Bypass construction methods and initialization code to create objects
public native Object allocateInstance(Class<?> cls) throws InstantiationException;

Typical application

  • Conventional object instantiation method: the method we usually use to create objects is essentially to create objects through the new mechanism. However, a feature of the new mechanism is that when the class only provides the constructor with parameters and does not display the declaration of the constructor without parameters, the constructor with parameters must be used for object construction, while when the constructor with parameters is used, the corresponding number of parameters must be passed to complete object instantiation.
  • Unconventional instantiation method: the allocateInstance method is provided in Unsafe. This kind of instance object can be created only through the Class object, and there is no need to call its constructor, initialization code, JVM security check, etc. It suppresses modifier detection, that is, even if the constructor is private modified, it can be instantiated through this method, and the corresponding object can be created by mentioning the Class object. Because of this feature, allocateInstance is in Java Lang. invoke, Objenesis (which provides an object generation method that bypasses the Class constructor) and Gson (used in deserialization) have corresponding applications.

As shown in the figure below, when Gson deserializes, if the class has a default constructor, it creates an instance by calling the default constructor through reflection. Otherwise, it realizes the construction of the object instance through UnsafeAllocator. UnsafeAllocator instantiates the object by calling the allocateInstance of Unsafe, so as to ensure that the deserialization will not have enough impact when the target class has no default constructor.

Array correlation

This section mainly introduces the arrayBaseOffset and arrayIndexScale methods related to data operation. When they are used together, they can locate the position of each element in the array in memory.

//Returns the offset address of the first element in the array
public native int arrayBaseOffset(Class<?> arrayClass);
//Returns the size occupied by an element in the array
public native int arrayIndexScale(Class<?> arrayClass);

Typical application

These two methods related to data operation are in Java util. concurrent. There are typical applications in AtomicIntegerArray (which can realize the atomic operation of each element in the Integer array) under the atomic package. As shown in the AtomicIntegerArray source code in the figure below, the offset address base of the first element of the array and the size factor scale of a single element are obtained through the arrayBaseOffset and arrayIndexScale of Unsafe. Subsequent related atomic operations rely on these two values to locate the elements in the array. The getAndAdd method shown in Figure 2 below obtains the offset address of an array element through the checkedByteOffset method, and then realizes atomic operations through CAS.

Memory barrier

It is introduced in Java 8 to define the memory barrier (also known as memory barrier, memory barrier, barrier instruction, etc.), which is a kind of synchronous barrier instruction. It is a synchronization point in the operation of random access to memory by CPU or compiler, so that all read and write operations before this point can be executed before the operation after this point can be started), so as to avoid code reordering.

//Memory barrier that prevents the load operation from reordering. Load operations before the barrier cannot be reordered to those after the barrier, and load operations after the barrier cannot be reordered to those before the barrier
public native void loadFence();
//Memory barrier to prohibit store operation reordering. Store operations before the barrier cannot be reordered behind the barrier, and store operations after the barrier cannot be reordered before the barrier
public native void storeFence();
//Memory barrier, which prohibits the reordering of load and store operations
public native void fullFence();

Typical application

A new lock mechanism, StampedLock, is introduced into Java 8, which can be regarded as an improved version of read-write lock. StampedLock provides an implementation of optimistic read lock. This optimistic read lock is similar to the operation without lock. It will not block the write thread to obtain the write lock at all, so as to alleviate the "hunger" of the write thread when reading more and writing less. Since the optimistic read lock provided by StampedLock does not block the write thread from obtaining the read lock, there will be data inconsistency when the thread shared variable is load ed from the main memory to the thread working memory. Therefore, when using the optimistic read lock of StampedLock, it is necessary to follow the mode used in the use case below to ensure data consistency.

As shown in the use case above, calculate the coordinate Point object, including the Point moving method move and the method distanceFromOrigin for calculating the distance from this Point to the origin. In the method distanceFromOrigin, first, obtain the optimistic read tag through the tryOptimisticRead method; Then load the coordinate value (x,y) of the Point from the main memory; Then verify the lock status through the validate method of StampedLock, and judge whether the value of the main memory has been modified by other threads through the move method when the coordinate Point (x,y) is loaded from the main memory to the thread working memory. If the value returned by validate is true, it proves that the value of (x,y) has not been modified and can participate in subsequent calculations; Otherwise, add pessimistic read lock, load the latest value of (x,y) from the main memory again, and then calculate the distance. Among them, the operation of verifying the lock state is very important. It is necessary to judge whether the lock state has changed, so as to judge whether the value previously copied to the thread working memory is inconsistent with the value of the main memory.

The following figure shows stampedlock The source code implementation of the validate method verifies the lock state by bit operation and comparison between the lock mark and relevant constants. Before verifying the logic, a load memory barrier will be added through the loadFence method of Unsafe in order to avoid steps ② and stampedlock in the use case above The lock state verification operation in validate is reordered, resulting in inaccurate lock state verification.

System related

This part contains two methods to obtain system related information.

//Returns the size of the system pointer. The return value is 4 (32-bit system) or 8 (64 bit system).
public native int addressSize();  
//The size of the memory page, which is a power of 2.
public native int pageSize();

Typical application

The code fragment shown in the figure below is Java NiO is a static method for calculating the number of memory pages required for the memory to be applied in the tool class Bits, which depends on the pageSize method in Unsafe to obtain the system memory page size and realize the subsequent calculation logic.

epilogue

This paper focuses on sun. Net in Java misc. The usage and application scenarios of Unsafe are introduced. We can see that Unsafe provides many convenient and interesting API methods. Even so, because Unsafe contains a large number of methods to operate memory independently, if used improperly, it will bring many uncontrollable disasters to the program. Therefore, we need to be cautious about its use.

reference material

Introduction to the author

Topics: Java jvm cas