Ali's interviewer asked me to talk about volatile. I started with HotSpot and won the interview with a set of combination punches

Posted by aspbyte on Wed, 29 Dec 2021 11:44:56 +0100

Hello, I'm Xiao Huang, a Java development engineer in Unicorn enterprise. Thank you for meeting us in the vast sea of people. As the saying goes: when your talent and ability are not enough to support your dream, please calm down and learn. I hope you can learn and work together with me to realize your dream.

1, Introduction

For Java developers, we generally use the underlying knowledge as a black box without opening it.

However, with the development of the programmer industry, it is necessary to open this black box to explore the mystery.

This series of articles will take you to explore the mystery of the underlying black box.

Before reading this article, it is recommended to download openJDK for better results

openJDK download address: openJDK

If you feel slower, suggest paying attention to the official account: love knocking the code of Xiao Huang, sending: openJDK can get Baidu SkyDrive link.

Can you give me some attention~

2, Operating system

1. Out of order execution of CPU

The CPU executes instructions while reading and waiting, which is the root of CPU disorder. Instead of disorder, it improves efficiency

Let's look at the following program:

x = 0;
y = 0;
a = 0;
b = 0;
Thread one = new Thread(new Runnable() {
    public void run() {
        //Since thread one starts first, let it wait for thread two The waiting time can be adjusted according to the actual performance of your computer
        //shortWait(100000);
        a = 1;
        x = b;
    }
});

Thread other = new Thread(new Runnable() {
    public void run() {
        b = 1;
        y = a;
    }
});
one.start();
other.start();
one.join();
other.join();
String result = "The first" + i + "second (" + x + "," + y + ")";
if (x == 0 && y == 0) {
    System.err.println(result);
    break;
}

We can see that if our CPU does not execute out of order, then a = 1 must be in front of x = b, and b = 1 must be in front of y = a

What conclusion can we draw, that is, x and y must not be 0 at the same time (here readers can think about why they can't be 0 at the same time)

We run the following program and get the following results:

We got the result when running 2728842 times, which verified our conclusion.

2.1 possible problems caused by disorder

Common example: why should volatile be added to DCL?

Let's take the following example:

class T{
	int m = 8;
}
T t = new T();

Decompile foreign exchange code:

0 new #2 <T>
3 dup
4 invokespeecial # 3 <T.<init>>
7 astore_1
8 return

We analyze the assembly code step by step:

New #2 < T >: create an object with m = 0, and there is a reference to the object in the stack frame
dup: copy a reference in our stack frame
Invokespecial #3 < T. < init > >: pop up a value in the stack frame and instantiate its construction method
astore_1: Assign the reference of our stack frame to t, where 1 refers to the first bit in our local variable table

So, let's think about one thing. We have proved that the CPU is out of order, so what harm will it do to our above operations?

When our astore_1. Before our invokespecial # 3 < T. < init > > is executed, it will cause us to assign t to the object we have not instantiated, as shown below:

Therefore, in order to avoid this phenomenon, we should add volatile to DCL

The question is, how do we ensure the order of volatile?

2.2 how to prohibit instruction reordering

We discuss the prohibition of instruction reordering from the following three aspects:

Code level
Bytecode level
JVM level
CPU level

2.2.1 Java code level

Add a volatile keyword directly

public class TestVolatile {
    public static volatile int counter = 1;

    public static void main(String[] args) {
        counter = 2;
        System.out.println(counter);
    }
}

2.2. 1 byte code level

At the bytecode level, after decompiling volatile, we can see VCC_volatile

We decompile the above code to get its bytecode

Through javac testvolatile Java compiles the class into a class file, and then passes javap - V testvolatile Class Command decompile view bytecode file

Here we only show the bytecode of this Code: public static volatile int counter = 1;

	public static volatile int counter;
    descriptor: I
    flags: ACC_PUBLIC, ACC_STATIC, ACC_VOLATILE
    // The following is the bytecode when initializing the counter
    0: iconst_2
    1: putstatic     #2                  // Field counter:I
    4: getstatic     #3                  // Field

descriptor: represents method parameters and return values
flags: ACC_PUBLIC, ACC_STATIC, ACC_VOLATILE: Flag
putstatic: operate on static attributes

Our subsequent operations can be through ACC_VOLATILE this flag to know that the variable has been modified by volatile

2.2. 3. Hotspot source code level

How does our JVM implement variables decorated with volatile?

We usually see these four words on my website: StoreStore, StoreLoad, LoadStore, LoadLoad

Our JVM is implemented in this way. Let's take a look at the specific implementation.

In Java, static attributes belong to classes. The static attribute is operated, and the corresponding instruction is putstatic

We use the bytecodeinterpreter. Under the root path jdk\src\hotspot\share\interpreter\zero of openjdk8 Code for processing putstatic instruction in cpp file:

CASE(_putstatic):
    {
          // ....  Omit several lines 
          // Now store the result
          // ConstantPoolCacheEntry* cache;     -- cache is a constant pool cache instance
          // cache->is_ Volatile () -- determines whether there is a volatile access flag modifier
          int field_offset = cache->f2_as_index();
          // ****Key judgment logic**** 
          if (cache->is_volatile()) { 
            // Assignment logic of volatile variable
            if (tos_type == itos) {
              obj->release_int_field_put(field_offset, STACK_INT(-1));
            } else if (tos_type == atos) {// Object type assignment
              VERIFY_OOP(STACK_OBJECT(-1));
              obj->release_obj_field_put(field_offset, STACK_OBJECT(-1));
              OrderAccess::release_store(&BYTE_MAP_BASE[(uintptr_t)obj >> CardTableModRefBS::card_shift], 0);
            } else if (tos_type == btos) {// byte type assignment
              obj->release_byte_field_put(field_offset, STACK_INT(-1));
            } else if (tos_type == ltos) {// long type assignment
              obj->release_long_field_put(field_offset, STACK_LONG(-1));
            } else if (tos_type == ctos) {// char type assignment
              obj->release_char_field_put(field_offset, STACK_INT(-1));
            } else if (tos_type == stos) {// short type assignment
              obj->release_short_field_put(field_offset, STACK_INT(-1));
            } else if (tos_type == ftos) {// float type assignment
              obj->release_float_field_put(field_offset, STACK_FLOAT(-1));
            } else {// double type assignment
              obj->release_double_field_put(field_offset, STACK_DOUBLE(-1));
            }
            // ***storeload barrier after writing value***
            OrderAccess::storeload();
          } else {
            // Assignment logic of non volatile variables
          }       
  }

Paste cache - > is here_ Source code of volatile(), path: JDK \ SRC \ hotspot \ share \ utilities \ accessflags hpp

  // Java access flags
  bool is_public      () const         { return (_flags & JVM_ACC_PUBLIC      ) != 0; }
  bool is_private     () const         { return (_flags & JVM_ACC_PRIVATE     ) != 0; }
  bool is_protected   () const         { return (_flags & JVM_ACC_PROTECTED   ) != 0; }
  bool is_static      () const         { return (_flags & JVM_ACC_STATIC      ) != 0; }
  bool is_final       () const         { return (_flags & JVM_ACC_FINAL       ) != 0; }
  bool is_synchronized() const         { return (_flags & JVM_ACC_SYNCHRONIZED) != 0; }
  bool is_super       () const         { return (_flags & JVM_ACC_SUPER       ) != 0; }
  bool is_volatile    () const         { return (_flags & JVM_ACC_VOLATILE    ) != 0; }
  bool is_transient   () const         { return (_flags & JVM_ACC_TRANSIENT   ) != 0; }
  bool is_native      () const         { return (_flags & JVM_ACC_NATIVE      ) != 0; }
  bool is_interface   () const         { return (_flags & JVM_ACC_INTERFACE   ) != 0; }
  bool is_abstract    () const         { return (_flags & JVM_ACC_ABSTRACT    ) != 0; }

Let's take a look at the assignment obj - > release_ long_ field_ Source code of put (field_offset, stack_long (- 1)): JDK \ SRC \ hotspot \ share \ OOP \ OOP inline. hpp

jlong oopDesc::long_field_acquire(int offset) const                   { return Atomic::load_acquire(field_addr<jlong>(offset)); }
void oopDesc::release_long_field_put(int offset, jlong value)         { Atomic::release_store(field_addr<jlong>(offset), value); }

Let's go to JDK \ SRC \ hotspot \ share \ runtime \ atomic HPP take a look at Atomic::release_store method

inline T Atomic::load_acquire(const volatile T* p) {
  return LoadImpl<T, PlatformOrderedLoad<sizeof(T), X_ACQUIRE> >()(p);
}
template <typename D, typename T>
inline void Atomic::release_store(volatile D* p, T v) {
  StoreImpl<D, T, PlatformOrderedStore<sizeof(D), RELEASE_X> >()(p, v);
}

We can clearly see that const volatile T* p and volatile D* p directly use the volatile keyword of C/C + + when calling

Let's continue to look. After we finish assigning parameters, we will have this operation: OrderAccess::storeload();

Let's look at orderaccess.jdk\src\hotspot\share\runtime HPP file, found such a piece of code

// barriers
  static void     loadload();
  static void     storestore();
  static void     loadstore();
  static void     storeload();

  static void     acquire();
  static void     release();
  static void     fence();

We can clearly see that this is the read-write barrier of the JVM we see on major websites

Of course, we have to look at its application in linux_x86 implementation mode, in JDK \ SRC \ hotspot \ OS_ cpu\linux_ Orderaccess for x86_ linux_x86. Under HPP

// A compiler barrier, forcing the C++ compiler to invalidate all memory assumptions
static inline void compiler_barrier() {
  __asm__ volatile ("" : : : "memory");
}

inline void OrderAccess::loadload()   { compiler_barrier(); }
inline void OrderAccess::storestore() { compiler_barrier(); }
inline void OrderAccess::loadstore()  { compiler_barrier(); }
inline void OrderAccess::storeload()  { fence();            }

inline void OrderAccess::acquire()    { compiler_barrier(); }
inline void OrderAccess::release()    { compiler_barrier(); }

2.2.4 CPU level

Intel primitive instructions: mfence memory barrier, ifence read barrier, sfence write barrier

We can see that the key is this line of code:__ asm__ volatile ("" : : : "memory");

__ asm__ : Used to instruct the compiler to insert assembly statements here
volatile: tell the compiler that it is strictly forbidden to recombine and optimize the assembly statements here with other statements. That is: the compilation here is processed as it is.
("":: "memory"): memory forces the gcc compiler to assume that all RAM memory units are modified by the assembly instruction, so that the data in registers in cpu and cached memory units in cache will be invalidated. The cpu will have to re read the data in memory when needed. This prevents the cpu from using the data in registers and cache to optimize instructions and avoid accessing memory.

Simple summary: tell our CPU not to optimize it for me blindly, and I will execute it serially.

In this way, we can see that these instructions maintain order by changing the CPU registers and cache

The basic interview is almost here. We can beat 80% of the interviewers and interviewers, but our article is not enough!

When we observe these methods, we will find a method called fence(). Let's observe this method:

inline void OrderAccess::fence() {
   // always use locked addl since mfence is sometimes expensive
   // 
#ifdef AMD64
  __asm__ volatile ("lock; addl $0,0(%%rsp)" : : : "cc", "memory");
#else
  __asm__ volatile ("lock; addl $0,0(%%esp)" : : : "cc", "memory");
#endif
  compiler_barrier();
}

We can see that our method does not recommend that we use our primitive instruction mfence (memory barrier), because mfence consumes more resources than locked

Directly determine whether AMD64 processes its different registers rsp\esp

"lock; addl "Lock; ADDL $0,0 (%% rsp)",0(%%rsp)": adding a 0 to the rsp register is a Full Barrier. During execution, the memory subsystem will be locked to ensure the execution order, even across multiple CPU s

Here, our volatile is almost the same. We should be able to beat 90% of the interviewers

2.3 hamppens before principle

In short, the JVM specifies the rules that reordering must follow (just know)

Program Order Rule
Tube side locking rules
volatile
Thread start rule
Thread termination rule
Thread interrupt rule
Object termination rule
Transitivity

2.4 as if serial

No matter how reordered, the result of single thread execution will not change

3, Summary

This article has been written for about a week. The most difficult thing is that I can't find a process from shallow to deep, so I don't know how to write it

Finally, it was successfully completed, which further improved my understanding of volatile

At least after reading this article, I'm not afraid of any interviewer on volatile questions

The next step is to talk about merge writing, process, thread, fiber or algorithm

I am a Java development engineer of Unicorn enterprise. I hope you can pay attention to the smart, lovely and kind-hearted. If you have any questions, you can leave a message or private letter. See you next time!

Topics: Java Interview

Programmer Think