Virtual machine class loading mechanism - class loading process

Posted by tblade on Thu, 27 Jan 2022 08:24:16 +0100

The Java virtual machine loads the data describing the Class from the Class file into the memory, verifies, converts, parses and initializes the data, and finally forms a Java type that can be directly used by the virtual machine. This process is called the Class loading mechanism of the virtual machine.

Class loading timing

A type starts from being loaded into the memory of the virtual machine to Unloading the memory. Its whole life cycle will go through seven stages: Loading, Verification, Preparation, Resolution, Initialization, Using and Unloading The three parts of parsing are collectively referred to as Linking. The sequence of these seven stages is shown in the figure.

 

The order of the five stages of loading, verification, preparation, initialization and unloading is determined. The loading process of types must start step by step in this order, while the parsing stage is not necessarily: it can start after the initialization stage in some cases, which is to support the runtime binding feature of Java language (also known as dynamic binding or late binding).

Class loading process

load

In the loading phase, the Java virtual machine needs to do the following three things

  1. Get the binary byte stream that defines a class by its fully qualified name. (there is no limit on how to obtain, which can be network acquisition, disk reading, dynamic generation, etc.)

  2. The static storage structure represented by this byte stream is transformed into the runtime data structure of the method area.

  3. Generate a Java. Net file representing this class in memory Lang. class object, as the access entry of various data of this class in the method area.

Compared with other stages of the class loading process, the loading stage of non array type (precisely, the action of obtaining the binary byte stream of the class in the loading stage) is the most controllable stage for developers. The loading phase can be completed by using the built-in boot class loader in the Java virtual machine or the user-defined class loader. Developers can control the acquisition method of byte stream by defining their own class loader (Rewriting the findClass() or loadClass() method of a class loader), The implementation endows the application with the dynamics of obtaining the running code according to its own ideas.

connect

Connection includes three stages: Verification, Preparation and Resolution.

verification

Verification is the first step in the connection phase. The purpose of this phase is to ensure that the information contained in the byte stream of the Class file meets all the constraints of the Java virtual machine specification, and ensure that these information will not endanger the safety of the virtual machine after being run as code.

The Java language itself is a relatively safe programming language (at least for C/C + +). Using pure java code cannot do things such as accessing data outside the array boundary, transforming an object into a type it does not implement, and jumping to a non-existent code line. If you try to do so, the compiler will throw exceptions mercilessly Compilation rejected. However, the Class file is not necessarily compiled from the Java source code. It can be generated in any way, including typing the Class file directly in the binary editor by keyboard 0 and 1. What the above java code cannot do can be realized at the bytecode level, at least semantically. If the Java virtual machine does not check the input byte stream and fully trust it, it is likely that the whole system will be attacked or even crash due to loading the byte code stream with errors or malicious attempts. Therefore, verifying the byte code is a necessary measure for the Java virtual machine to protect itself.

The verification phase will generally complete the following four stages of inspection actions:

  • File format validation

  • Metadata validation

  • Bytecode verification

  • Symbol reference verification

prepare

The preparation stage is the stage of formally allocating memory for the variables defined in the Class (i.e. static variables, variables modified by static) and setting the initial value of Class variables. Conceptually, the memory used by these variables should be allocated in the method area, but it must be noted that the method area itself is a logical area. Before JDK7, When HotSpot uses the permanent generation to implement the method area, the implementation is completely in line with this logical concept; In JDK8 and beyond, Class variables will be stored in the Java heap together with Class objects. At this time, "Class variables in the method area" is completely an expression of logical concepts.

About the preparation stage, there are two concepts that are easy to be confused. The author needs to emphasize that first, at this time, memory allocation only includes class variables, not instance variables. Instance variables will be allocated in the Java heap with the object when the object is instantiated. Secondly, the initial value mentioned here "usually" is the zero value of the data type. Suppose a class variable is defined as:

public static int value=123;

The initial value of the variable value after the preparation stage is 0 instead of 123, because no Java method has been executed at this time. The putstatic instruction that assigns value to 123 is compiled and stored in the < clinit > () method of the class constructor. Therefore, the action that assigns value to 123 will not be executed until the initialization stage of the class. If the above code is decorated with final, the initial value is 123 after the preparation phase (at this time, this value will be placed in the runtime constant pool in the method area). Note that if it is a local variable, because there is no preparation phase, it must be explicitly initialized after definition, otherwise it cannot be compiled.

analysis

The parsing phase is the process in which the Java virtual machine replaces the symbolic reference in the constant pool with a direct reference.

  • Symbolic References: a symbol reference describes the referenced target with a group of symbols. Symbols can be literal quantities in any form, as long as they can locate the target unambiguously. The symbol reference has nothing to do with the memory layout implemented by the virtual machine, and the reference target is not necessarily the content that has been loaded into the memory of the virtual machine. The memory layout of various virtual machines can be different, but the symbol references they can accept must be consistent, because the literal form of symbol reference is clearly defined in the Class file format of Java virtual machine specification.

    For example, disassemble the following code with javap to obtain bytecode instructions

    public class TestClzz {
        private static final int fInt = 99;
    ​
        private static double sDouble = 8;
    ​
        private float aFloat = 77;
    ​
        public static void main(String[] args) {
    ​
        }
    ​
        public static String getMSg() {
            return "Hello";
        }
    ​
        private boolean isTrue() {
            return false;
        }
    }

    implement

    javap -verbose .\TestClzz.class

    obtain

    Classfile /D:/javaproject/myexampleproject/MyWallet/target/classes/com/kmning/wallet/jvm/TestClzz.class
      Last modified 2022 January 25; size 760 bytes
      SHA-256 checksum 2a0a77f9a1d83a89dbd3c1f1c0a36081b607cd32f835f3a8f08773299ed13d93
      Compiled from "TestClzz.java"                                                    
    public class com.kmning.wallet.jvm.TestClzz
      minor version: 0
      major version: 52
      flags: (0x0021) ACC_PUBLIC, ACC_SUPER
      this_class: #8                          // com/kmning/wallet/jvm/TestClzz
      super_class: #9                         // java/lang/Object
      interfaces: 0, fields: 3, methods: 5, attributes: 1
    Constant pool:
       #1 = Methodref          #9.#36         // java/lang/Object."<init>":()V
       #2 = Float              77.0f
       #3 = Fieldref           #8.#37         // com/kmning/wallet/jvm/TestClzz.aFloat:F
       #4 = String             #38 / / Hello
       #5 = Double             8.0d
       #7 = Fieldref           #8.#39         // com/kmning/wallet/jvm/TestClzz.sDouble:D
       #8 = Class              #40            // com/kmning/wallet/jvm/TestClzz
       #9 = Class              #41            // java/lang/Object
      #10 = Utf8               fInt
      #11 = Utf8               I
      #12 = Utf8               ConstantValue
      #13 = Integer            99
      #14 = Utf8               sDouble
      #15 = Utf8               D
      #16 = Utf8               aFloat
      #17 = Utf8               F
      #18 = Utf8               <init>
      #19 = Utf8               ()V
      #20 = Utf8               Code
      #21 = Utf8               LineNumberTable
      #22 = Utf8               LocalVariableTable
      #23 = Utf8               this
      #24 = Utf8               Lcom/kmning/wallet/jvm/TestClzz;
      #25 = Utf8               main
      #26 = Utf8               ([Ljava/lang/String;)V
      #27 = Utf8               args
      #28 = Utf8               [Ljava/lang/String;
      #29 = Utf8               getMSg
      #30 = Utf8               ()Ljava/lang/String;
      #31 = Utf8               isTrue
      #32 = Utf8               ()Z
      #33 = Utf8               <clinit>
      #34 = Utf8               SourceFile
      #35 = Utf8               TestClzz.java
      #36 = NameAndType        #18:#19        // "<init>":()V
      #37 = NameAndType        #16:#17        // aFloat:F
      #38 = Utf8# hello
      #39 = NameAndType        #14:#15        // sDouble:D
      #40 = Utf8               com/kmning/wallet/jvm/TestClzz
      #41 = Utf8               java/lang/Object
    {
      public com.kmning.wallet.jvm.TestClzz();
        descriptor: ()V
        flags: (0x0001) ACC_PUBLIC
        Code:
          stack=2, locals=1, args_size=1
             0: aload_0
             1: invokespecial #1                  // Method java/lang/Object."<init>":()V
             4: aload_0
             5: ldc           #2                  // float 77.0f
             7: putfield      #3                  // Field aFloat:F
            10: return
          LineNumberTable:
            line 8: 0
            line 13: 4
          LocalVariableTable:
            Start  Length  Slot  Name   Signature
                0      11     0  this   Lcom/kmning/wallet/jvm/TestClzz;
    ​
      public static void main(java.lang.String[]);
        descriptor: ([Ljava/lang/String;)V
        flags: (0x0009) ACC_PUBLIC, ACC_STATIC
        Code:
          stack=0, locals=1, args_size=1
             0: return
          LineNumberTable:
            line 17: 0
          LocalVariableTable:
            Start  Length  Slot  Name   Signature
                0       1     0  args   [Ljava/lang/String;
    ​
      public static java.lang.String getMSg();
        descriptor: ()Ljava/lang/String;
        Code:
          stack=2, locals=0, args_size=0
             0: ldc2_w        #5                  // double 8.0d
             3: putstatic     #7                  // Field sDouble:D
             6: return
          LineNumberTable:
            line 11: 0
    }
    SourceFile: "TestClzz.java"

    The pile of "Utf8" in the above constant pool is symbolic reference.

  • Direct References: a direct reference is a pointer that can directly point to the target, a relative offset, or a handle that can indirectly locate the target. Direct reference is directly related to the memory layout of the virtual machine implementation. The direct reference translated by the same symbolic reference on different virtual machine instances will generally not be the same. If there is a direct reference, the target of the reference must already exist in the memory of the virtual machine.

The parsing action mainly refers to seven types of symbol References: class or interface, field, class method, interface method, method type, method handle and call point qualifier, which respectively correspond to the constant pool_ Class_ info,CON-STANT_Fieldref_info,CONSTANT_Methodref_info,CONSTANT_InterfaceMethodref_info,CONSTANT_MethodType_info,CONSTANT_MethodHandle_info,CONSTANT_Dyna-mic_info and CONSTANT_InvokeDynamic_info8 constant types.

initialization

The initialization phase of a class is the last step in the class loading process. Among the several class loading actions described earlier, except that the user application can partially participate in the loading phase by customizing the class loader, the other actions are completely dominated and controlled by the Java virtual machine. It is not until the initialization phase that the Java virtual machine really starts to execute the Java program code written in the class and hand over the ownership to the application.

During the preparation phase, the variable has been assigned the initial zero value required by the system once. In the initialization phase, class variables and other resources will be initialized according to the subjective plan formulated by the programmer through program coding. We can also express it in another more direct form: the initialization stage is the process of executing the < clinit > () method of the class constructor< Clinit > () is not a method written directly by programmers in Java code. It is an automatic generation of Javac compiler.

  • The < clinit > () method is generated by the combination of the assignment action of all class variables in the compiler's automatic collection class and the statements in the static statement block (static {} block). The order of the compiler's collection is determined by the order in which the statements appear in the source file. In the static statement block, only the variables defined before the static statement block and the variables defined after it can be accessed, The previous static statement block can be assigned, but cannot be accessed.

  • The < clinit > () method is different from the constructor of the class (that is, the instance constructor < init > () method in the perspective of virtual machine). It does not need to explicitly call the parent constructor. Java virtual opportunity ensures that the < clinit > () method of the parent class has been executed before the < clinit > () method of the subclass is executed. Therefore, the type of the first < clinit > () method to be executed in the Java virtual machine must be Java lang.Object.

  • Since the < clinit > () method of the parent class is executed first, it means that the static statement block defined in the parent class takes precedence over the variable assignment operation of the child class.

  • < clinit > () method is not necessary for class or interface. If there is no static statement block in a class and no assignment operation on variables, the compiler can not generate < clinit > () method for this class.

  • Static statement blocks cannot be used in the interface, but there are still assignment operations of variable initialization, so the interface will generate < clinit > () methods like classes. However, the difference between interfaces and classes is that the < clinit > () method of the execution interface does not need to execute the < clinit > () method of the parent interface first, because the parent interface will be initialized only when the variables defined in the parent interface are used. In addition, the implementation class of the interface will not execute the < clinit > () method of the interface during initialization.

  • The Java virtual machine must ensure that the < clinit > () method of a class is locked and synchronized correctly in a multithreaded environment. If multiple threads initialize a class at the same time, only one thread will execute the < clinit > () method of this class, and other threads need to block and wait until the active thread finishes executing the < clinit > () method. If there is a time-consuming operation in the < clinit > () method of a class, it may cause multiple process blocking, which is often hidden in practical applications.

In the following code, < clinit > () method will not be executed completely. In case of multi-threaded access, only one thread will execute < clinit > () method. Other threads will block. After the < clinit > () method is executed, the blocked thread will not enter the < clinit > () method again after waking up. Under the same class loader, a type will be initialized only once.

public class DeadLoopTestCase {

    static class DeadLoopClass {
        static {
            // If this if statement is not added, the compiler will prompt "Initializer does not complete normally" and refuse to compile
            if (true) {
                System.out.println(Thread.currentThread() + "init DeadLoopClass");
                while (true) {
                }
            }
        }
    }

    public static void main(String[] args) {
        Runnable script = new Runnable() {
            public void run() {
                System.out.println(Thread.currentThread() + "start");
                DeadLoopClass dlc = new DeadLoopClass();
                System.out.println(Thread.currentThread() + " run over");
            }
        };

        Thread thread1 = new Thread(script);
        Thread thread2 = new Thread(script);
        thread1.start();
        thread2.start();
    }

}

Execution result (the program is blocked and will not stop)

Thread[Thread-1,5,main]start
Thread[Thread-0,5,main]start
Thread[Thread-0,5,main]init DeadLoopClass

It can be seen that only one thread has executed the < clinit > () method.

Topics: Java jvm