This article mainly analyzes a simple class file. The specific tables used for class file analysis are not listed here.
The code and analysis are from the third edition of in-depth understanding of Java virtual machine, which uses jdk6, but it still has reference significance.
1. Code
package org.fenixsoft.clazz; public class TestClass { private int m; public int inc() { return m + 1; } }
The following class file is from the third edition of in-depth understanding of Java virtual machine. It uses jdk6, the old version, but it still has reference significance.
After compiling into a class file and opening it with an editor, you can see the value of each byte and the corresponding ascii characters.
The following analysis is agreed: 0x starts with an address, H ends with a hexadecimal number, and ordinary numbers are decimal.
Header information
0x00-0x03 CAFEBABE This is class File identifier. 0x04-0x05 Minor version number. The maximum two bytes is 65535 0x06-0x07 Major version number. 0032 H = 50, Look up the table to see the corresponding jkd6
2. Constant pool
The constant pool holds literal and symbolic references. Literal quantities are understood as constants, such as strings and final constants. Symbolic references include class names, field and method names, descriptors, dynamic constants, and so on.
Data types include unsigned numbers and tables. Unsigned numbers are divided into u1 u2 u4 u8 different byte lengths. Tables are composed of tables or unsigned numbers.
The first byte of the constant pool gives the size of the constant pool, followed by each constant. There are 17 constant types, all in tabular form, that is, it contains multiple different types of information.
In the constant table, the first byte gives the constant type, and then it is understood in a corresponding way according to the structure corresponding to the type.
0x08-0x09 0016H = 22 There are 22 in the back-1=21 A constant.
This is followed by the first constant:
0x0A 07H=7 Look up the table to know yes class_info Type of table. Look up the table and know that this form should be stored u1 Type tag and u1 Type name_index,tag Is this byte, tag=7;So the next two bytes are name_index. 0x0B-0x0C 02H=2 The second constant that points to the constant pool. That's the next constant
The second constant is the same analysis:
0x0D 01H=1 Look up the table to know yes CONSTANT_Utf8_info Type of table type constant, which may be variable name, method name, etc. Look up the table and know that this form should be stored u1 Type tag(That is, this byte, tag=1) and u2 Type length,length individual u1 Type bytes. be careful bytes The bytes used are utf-8 Abbreviated coding is to use 1 byte for one byte translation, otherwise 2 bytes, otherwise 3 bytes. This type( CONSTANT_Utf8_info)Often used to describe names, so java Variable name length cannot exceed length Maximum, i.e. 65535 0x0E-0x0F 001DH=29 ,length=29; 0x10-0x2D use utf-8 Abbreviated coding bytes="org/fenixsoft/clazz/TestClass"
utf-8 abbreviated code:
The abbreviated encoding of characters from '\ u0001' to '\ u007f' (equivalent to ASCII code of 1 ~ 127) is represented by one byte, the abbreviated encoding of all characters from '\ u0080' to '\ u07ff' is represented by two bytes, and the abbreviated encoding of all characters from '\ u0800' to '\ ufff' is represented by three bytes according to the ordinary UTF-8 encoding rules.
The remaining 19 of the 21 constants are the same solution. The following is analyzed by software. The javap provided with java can be analyzed.
3. Parsing using javap
Next, I use javap to analyze the bytecode generated from the code. I use jdk12. The code has not changed and the compiler is different.
javap -verbose TestClass
package org.fenixsoft.clazz; public class TestClass { private int m; public int inc() { return m + 1; } }
output
...... public class org.fenixsoft.clazz.TestClass minor version: 0 major version: 56 flags: (0x0021) ACC_PUBLIC, ACC_SUPER this_class: #3 // org/fenixsoft/clazz/TestClass super_class: #4 // java/lang/Object interfaces: 0, fields: 1, methods: 2, attributes: 1 Constant pool: #1 = Methodref #4.#15 // java/lang/Object."<init>":()V #2 = Fieldref #3.#16 // org/fenixsoft/clazz/TestClass.m:I #3 = Class #17 // org/fenixsoft/clazz/TestClass #4 = Class #18 // java/lang/Object #5 = Utf8 m #6 = Utf8 I #7 = Utf8 <init> #8 = Utf8 ()V #9 = Utf8 Code #10 = Utf8 LineNumberTable #11 = Utf8 inc #12 = Utf8 ()I #13 = Utf8 SourceFile #14 = Utf8 TestClass.java #15 = NameAndType #7:#8 // "<init>":()V #16 = NameAndType #5:#6 // m:I #17 = Utf8 org/fenixsoft/clazz/TestClass #18 = Utf8 java/lang/Object { public org.fenixsoft.clazz.TestClass(); descriptor: ()V flags: (0x0001) ACC_PUBLIC Code: stack=1, locals=1, args_size=1 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return LineNumberTable: line 2: 0 public int inc(); descriptor: ()I flags: (0x0001) ACC_PUBLIC Code: stack=2, locals=1, args_size=1 0: aload_0 1: getfield #2 // Field m:I 4: iconst_1 5: iadd 6: ireturn LineNumberTable: line 5: 0 } SourceFile: "TestClass.java"
The constant pool contains variables that are not defined in the source code and are generated by the compiler.
It will be referenced by the field_info, method_info and attribute_info mentioned later. They will be used to describe some contents that are inconvenient to be expressed by "fixed bytes", such as describing the return value of the method, several parameters and the type of each parameter.
4. Remainder
Go back to my bytecode file and continue to analyze the rest. The rest can be divided as follows. You can compare the following results with the above javap output. They are consistent.
class Class name Parent class name Number of interfaces (0) Number of variables (1) Variable (class, instance) Number of methods (2) Method 1 Method parameters Method return value type, etc Method body code Source code, each step LineNumberTable Line_NUmber_info A few lines a few lines Method 2 Method parameter return value type, etc Method body code Each specific step LineNumberTable(I don't know the use for the time being) Line_NUmber_info A few lines a few lines Other information Source file name
You can see that the nesting is still very deep. There is code in the method, LineNumberTable in the code, and then Line_NUmber_info table, which contains several rows of information. This structure can be compared with the output of javap above.
The corresponding bytecode is as follows. The following analyzes the additional information of class member variable member method in turn.
1. Class
00 21 00 03 00 04 00 00
Access flag
After the constant pool is the access flag, which uses 2 bytes and 16 bits to indicate whether the class belongs to 9 states, such as whether it can inherit, whether it is an interface, whether it is an abstract class, etc.
21 Used public Decoration, others are not.
Class index parent class index interface index
Class index parent class index is u2 data
An interface index is a collection of u2 data.
03 Point to class 3 04 Point to parent 4 00 No interface
Back to Constant pool, No. 3 and No. 4 are just class types. The corresponding values point to 17 and 18, namely org/fenixsoft/clazz/TestClass and java/lang/Object.
2. Variables: field table sets
00 01 00 02 00 06 00 05 00 00
Contains a collection of class and instance variables, but does not include local variables within a method.
Three concepts
Fully qualified name: package name, etc
Simple name: method name, variable name.
Descriptor: describes the method return value, method parameters, and the type of variables. Represented by a letter, the descriptor of the method is in the order of (parameter) - return value
For example, the corresponding descriptor of the following type is
java.lang.String[][] type -> [[Ljava/lang/String int[] -> [I method int indexOf(char[]a,int b,int c,char[]d,int e,int f,int g)-> ([CII[CIII)I
01 Number of field table data 02 Access symbol private 05 The variable name index is found according to the above constant pool m 06 According to the above constant pool, it is found that I Thus restored private int m 00 Indicates that there is no additional information if it turns out to be private int m = 2;There is additional information.
The fields inherited from the parent class or parent interface will not be listed in the field table collection, but they may appear in the original Java code
For existing fields, for example, in an internal class, in order to maintain the accessibility of the external class, the compiler will automatically add words pointing to the external class instance
Paragraph. In addition, in the Java language, fields cannot be overloaded. Whether the data types and modifiers of the two fields are the same or not, they must be overloaded
Use different names, but for the Class file format, as long as the descriptors of the two fields are not exactly the same, the field names will be the same
It's legal.
3. Methods: method table collection
Similar to the field table, it includes access_flags, name_index, descriptor_index and attributes
But it's complicated,
02 Number of methods 2 methods
There are two methods. Let's talk about one method. This is the part below:
01 jurisdiction 07 Name index, pointing to <init>method,The description is a constructor 08 Descriptor index, pointing to()V. No input, no return. 01 Number of attributes 09 Property name index points to code ,The following is a description code Property to resolve.
In the Java language, to Overload a method, in addition to having the same simple name as the original method, it also requires
You must have a signature that is different from the original method [2]. Characteristic signature refers to the field symbol of each parameter in a method in the constant pool
The collection of references, precisely because the return value will not be included in the feature signature, the Java language cannot rely solely on the return value
To overload an existing method. However, in the Class file format, the range of feature signatures is obviously larger,
Two methods can coexist as long as the descriptors are not exactly the same. That is, if two methods have the same name and signature
Name, but the return values are different, so they can legally coexist in the same Class file.
Property sheet collection
Class files, field tables, and method tables can all carry their own attribute table sets,
jdk12 contains 29 attributes. Some attributes in the class can record the source file name, module information, main class, etc
Method: remainder
Continue with the above analysis. The first is code
00 09 point code ,The following is the description code 00 00 00 1D code Part 1 DH long 00 01 max_stack 00 01 max_local 00 00 00 05 code_length 2A B7 00 01 B1 In the logic part of the code, the specific logic can be obtained by looking up the table 00 00 Different constant 00 01 Number of attributes 00 0A The attribute type is LineNumberTable
max_local is the space required for the local variable table, and the unit is the variable slot. Variables lower than 32 bits have one slot and two slots such as double. The local variable table contains: parameters (including this), local variables of the method body, Exception defined by catch of trycatch.
Next, look at the LineNumberTable property.
code_length: java specifies that the code length of a method cannot exceed 65536
00 0A The attribute type is LineNumberTable 00 06 The attribute length is 6 01 There is one line_number surface
line_number_ The info table includes start_pc and line_number two u2 type data items, the former is the bytecode line number, and the latter is the Java source code line number
00 00 The bytecode line number is 0 00 02 The source code line number is 2
So far, the first method has been parsed. The second method is similar to the first. It can be seen that the first method is an instance constructor and the second is an inc() function defined by ourselves.
4. Additional information
File name.
00 01 00 0D 00 00 02 00 0E
00 01 A message 00 0D Information Name: sourcefile 00 00 00 02 00 0E point"TestClass.java"