[J2SE]Object. Implementation of hashcode

Posted by insub2 on Wed, 19 Jan 2022 21:58:01 +0100

Contents of this article

  • 1, Basic concepts
  • 2, Implementation of hotspot
    • Object.hashCode()
      • mark word
      • get_next_hash(thread, obj)
    • System.identityHashCode(obj)
  • 3, Test verification
    • 1. After GC, the memory address of the object changes, but the hash value remains unchanged
    • 2. The hash value is saved in the mark word of the object header

1, Basic concepts

API notes: Object (Java SE 15 & JDK 15) (oracle.com)

Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.

The general contract of hashCode is:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

The return value of hashCode method is an int type hash value. The rewriting principles of hashCode and equals methods will be mentioned in J2SE collection framework:

  • The hash value calculated by the hashCode() method should remain unchanged at runtime: for example, if you rewrite the hashCode method, the calculation of its hash value depends on the name+age+others of the object. Even if you modify the name, age or other related attributes of the object at runtime, you should also keep the return value of the hashCode method unchanged, otherwise you put an object into the collection framework and then modify the name or age, As a result, you find that you can no longer find this object in the collection framework. Because the hash value returned by hashCode has changed and located in a slot different from the table [] array, the first principle for rewriting hashCode is to keep the hash value calculated by the hashCode() method unchanged at runtime;
  • If two objects are equal through the equals method, the hash value returned by the hashCode method must also be equal;
  • Conversely, there is no mandatory requirement, that is, the hash values returned by hashcodes of two objects are equal, and equal is not required to be equal. However, in the collection framework, different objects try to return different hash values for better performance;

Object.hashCode() method is a native method. In different versions of API documentation, there are some differences in annotation documents:

 JDK8 API documentation:

This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.

JDK9~12 API documentation:

The hashCode may or may not be implemented as some function of an object's memory address at some point in time.

 JDK15 API documentation:

According to this method, the notes of different versions differ as follows:

  • Before JDK8, the typical implementation of hash method is to convert the internal address of an object into an int value, but this is not necessary for Java programming language;
  • JDK9~12, the implementation of Java hash method can or can not depend on the memory address of the object;
  • After JDK13, remove this comment directly for fear of misleading Java programmers.

This ambiguous statement is simply.

For Java programmers, we only need to know the rewriting principles of these two methods, but we want to know more than that.

question:

  1. If the Java hash method depends on the memory address of the object, the object will be moved during GC, and the memory address of the object will change, object How does the hashcode () method keep the hash value unchanged?
  2. If the hash method of Java depends on the memory address of the object, the object will be moved during GC. Suppose that the hashCode() method of object B will return the same hash value after object B is moved to the address of object A?
  3. If it does not depend on the memory address of the object, object How is the hashcode () method implemented?
  4. And system How is identity hashcode (object) implemented?

Next, let's take a look at the source code of hotspot.

2, Implementation of hotspot

1.Object.hashCode()

Earlier JVM versions (such as JDK8), object The hashcode () method is in object C file states that JDK15 has been changed to JVM The H header file states that this paper is also based on the source code of JDK15.

jdk/jvm.h at master · openjdk/jdk · GitHub

/*************************************************************************
 PART 1: Functions for Native Libraries
 ************************************************************************/
/*
 * java.lang.Object
 */
JNIEXPORT jint JNICALL
JVM_IHashCode(JNIEnv *env, jobject obj);

JNIEXPORT void JNICALL
JVM_MonitorWait(JNIEnv *env, jobject obj, jlong ms);

JNIEXPORT void JNICALL
JVM_MonitorNotify(JNIEnv *env, jobject obj);

JNIEXPORT void JNICALL
JVM_MonitorNotifyAll(JNIEnv *env, jobject obj);

JNIEXPORT jobject JNICALL
JVM_Clone(JNIEnv *env, jobject obj);

 Object. The hashcode () method is in the JVM Implemented in CPP:

jdk/jvm.cpp at master · openjdk/jdk · GitHub

JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
  // as implemented in the classic virtual machine; return 0 if object is NULL
  return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;
JVM_END

ObjectSynchronizer class in synchronizer Declared in HPP file, in synchronizer Implemented in CPP:

 jdk/synchronizer.cpp at master · openjdk/jdk · GitHub

intptr_t ObjectSynchronizer::FastHashCode(Thread* current, oop obj) {

  while (true) {
    ObjectMonitor* monitor = NULL;
    markWord temp, test;
    intptr_t hash;
    markWord mark = read_stable_mark(obj);

    if (mark.is_neutral()) {               // if this is a normal header
      hash = mark.hash();
      if (hash != 0) {                     // if it has a hash, just return it
        return hash;
      }
      hash = get_next_hash(current, obj);  // get a new hash
      temp = mark.copy_set_hash(hash);     // merge the hash into header
                                           // try to install the hash
      test = obj->cas_set_mark(temp, mark);
      if (test == mark) {                  // if the hash was installed, return it
        return hash;
      }
      // Failed to install the hash. It could be that another thread
      // installed the hash just before our attempt or inflation has
      // occurred or... so we fall thru to inflate the monitor for
      // stability and then install the hash.
    } else if (mark.has_monitor()) {
      // ......
    } else if (current->is_lock_owned((address)mark.locker())) {
      // ......
    }
    // For brevity and clarity, we omit the case of objects as monitors and locks
    // Look directly at the process of calculating and saving hash values for ordinary objects
    // ......
    return hash;
  }
}

The ObjectSynchronizer::FastHashCode method includes common objects, monitors, and locks. We select the case where the hash value of common objects is calculated and saved for research. The method logic is as follows:

  1. First get the mark word in the object header of the object calling the hashCode() method;
  2. If the object is a normal object:
    1. If the hash value has been saved in mark word, the hash value is returned;
    2. If the hash value does not exist in mark word, call get_ next_ The hash (thread, obj) method calculates the hash value of the object;
    3. Save the calculated hash value to the corresponding bit bit in the mark word of the object header and return it;
  3. Omit monitor and locking;

There are two key points in the implementation of hashCode() method by hotspot: mark word and get_next_hash(thread, obj) method.

A.mark word

About mark word, in Alignment rules for Java objects When talking about the object header in Chapter 4.1 of this article, its source code and structure have been analyzed. If you are not familiar with it, you can look at this part first and then read on.

On the 32bit machine, 25bit in mark word is used to save the hash value, and on the 64bit machine, 31bit is used to save the hash value. The structure given in the source code comment is as follows:

//  32 bits:
//  --------
//             hash:25 ------------>| age:4    biased_lock:1 lock:2 (normal object)
//             JavaThread*:23 epoch:2 age:4    biased_lock:1 lock:2 (biased object)
//             size:32 ------------------------------------------>| (CMS free block)
//             PromotedObject*:29 ---------->| promo_bits:3 ----->| (CMS promoted object)
//
//  64 bits:
//  --------
//  unused:25 hash:31 -->| unused:1   age:4    biased_lock:1 lock:2 (normal object)
//  JavaThread*:54 epoch:2 unused:1   age:4    biased_lock:1 lock:2 (biased object)
//  PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object)
//  size:64 ----------------------------------------------------->| (CMS free block)

So the first question above is object The answer is how the hashcode () method keeps the hash value unchanged:

  • When the object hashCode() method is not called, the corresponding bit in the object header mark word is 0;
  • Once the hashCode() method is called, get_next_hash(thread, obj) calculates the hash value of the object and saves it to the corresponding bit in the mark word of the object header;
  • In subsequent calls, the hash value saved in the mark word of the object header is directly returned;

So for the time being_ next_ The hash (thread, obj) method calculates whether the object hash value is based on the object's memory address. Even if the object moves after GC based on the object's memory address, the hash value saved in the mark word of its object header is obtained by calling the hashCode() method for the first time.

When we test and verify in the following section 3, we will observe the whole process through the jol tool. The jol tool (Java object layout) is an open-source tool of openjdk to view the memory layout of Java objects Alignment rules for Java objects It is also introduced in the article.

B.get_next_hash(thread, obj)

static inline intptr_t get_next_hash(Thread* current, oop obj) {
  intptr_t value = 0;
  if (hashCode == 0) {
    // This form uses global Park-Miller RNG.
    // On MP system we'll have lots of RW access to a global, so the
    // mechanism induces lots of coherency traffic.
    value = os::random();
  } else if (hashCode == 1) {
    // This variation has the property of being stable (idempotent)
    // between STW operations.  This can be useful in some of the 1-0
    // synchronization schemes.
    intptr_t addr_bits = cast_from_oop<intptr_t>(obj) >> 3;
    value = addr_bits ^ (addr_bits >> 5) ^ GVars.stw_random;
  } else if (hashCode == 2) {
    value = 1;            // for sensitivity testing
  } else if (hashCode == 3) {
    value = ++GVars.hc_sequence;
  } else if (hashCode == 4) {
    value = cast_from_oop<intptr_t>(obj);
  } else {
    // Marsaglia's xor-shift scheme with thread-specific state
    // This is probably the best overall implementation -- we'll
    // likely make this the default in future releases.
    unsigned t = current->_hashStateX;
    t ^= (t << 11);
    current->_hashStateX = current->_hashStateY;
    current->_hashStateY = current->_hashStateZ;
    current->_hashStateZ = current->_hashStateW;
    unsigned v = current->_hashStateW;
    v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
    current->_hashStateW = v;
    value = v;
  }

  value &= markWord::hash_mask;
  if (value == 0) value = 0xBAD;
  assert(value != markWord::no_hash, "invariant");
  return value;
}

get_ next_ There are five hash value calculation methods in hash (thread, obj) method:

  • 0. Random number
  • 1. Function based on object memory address
  • 2. Equal to 1 (for sensitivity test)
  • 3. Self increasing sequence
  • 4. Force the memory address of the object to int
  • 5. Marsaglia's XOR shift scheme with thread specific state

Which calculation method is used depends on the parameter hashCode, which is displayed in globals Configuration in HPP.

  • stay openjdk8 Later, the Marsaglia XOR shift scheme is adopted

Therefore, from multiple versions of openjdk, object The default implementation of the hashcode () method does not use the memory address of the object to calculate.

So the first and second questions don't need to be tangled.

2,System.identityHashCode(obj)

In hotspot, Java lang.System. The implementation of identityhashcode (object) is to transfer the JVM directly_ Ihashcode, that is, system The implementation of identityhashcode is object Implementation of hashcode in hotspot.

jdk/System.c at master · openjdk/jdk · GitHub

JNIEXPORT jint JNICALL
Java_java_lang_System_identityHashCode(JNIEnv *env, jobject this, jobject x)
{
    return JVM_IHashCode(env, x);
}

Therefore, there are several situations as follows

  • If you don't rewrite object For hashcode () method, the default hash method logic is followed;
  • If you override object The hashCode() method will follow the hashCode() logic you rewritten every time it is called;
    • If you call super. in the rewrite hash method Hashcode(), the default hash method logic will be followed. The default hash value will be calculated by calling the accountant for the first time and stored in the mark word of the object header, and then directly obtained and returned from the mark word;
  • In system Directly transfer object in identityhashcode (object) The implementation of hashcode () will also save the calculated default hash value to the mark word of the object header. If it has been saved, it will be returned directly;

3, Test verification

Here, you need to use the open source tool jol (Java object layout) of openjdk to view the memory layout of Java objects. The tool is in Alignment rules for Java objects It has been introduced in the article. If you are not familiar with it, you can take a look at this article first.

1. After GC, the memory address of the object changes, but the hash value remains unchanged

Dummy class is used to create placeholder objects and 4M byte arrays to trigger GC.

/**
 * Placeholder resource, 4M byte array, used to trigger GC operation for test
 * @author Da Chui Wang
 * @date 2021 July 18
 */
public class Dummy {
	@SuppressWarnings("unused")
	private byte[] dummy = new byte[4 * 1024 * 1024];
}

jvm startup parameters:

-Xmx20m -Xmx20m -XX:+PrintGCDetails

Test the code, which uses the JOL tool org openjdk. jol. vm. VM class:

	public static void main(String[] args) {
		Object object = new Object();
		System.out.println("GC front:");
		addressOf(object);
		new Dummy();
		new Dummy();
		new Dummy();
		System.gc();
		System.out.println("GC After:");
		addressOf(object);
	}
	
	private static <T> void addressOf(T t) {
		long address = VM.current().addressOf(t);
		System.out.println(t + " hashCode is: " + t.hashCode() +", address is: " + address);
	}

For the test results, in order to facilitate reading, the GC information printed by - XX:+PrintGCDetails is omitted:

Before GC:
java.lang.Object@33f88ab hashCode is: 54495403, address is: 34359268032

After GC:
java.lang.Object@33f88ab hashCode is: 54495403, address is: 34200357760

The default hash value of objects before and after gc is 54495403. The memory address of objects before gc is 34359268032, and the memory address of objects after gc is 34200357760.

From the above analysis, we can see that the default hash value is saved in the mark word of the object header. See the next test.

2. The hash value is saved in the mark word of the object header

Test the code, which uses the JOL tool org openjdk. jol. info. Classlayout class:

		Object object = new Object();
		System.out.println(ClassLayout.parseInstance(object).toPrintable());
		System.out.println(object.hashCode());
		System.out.println(ClassLayout.parseInstance(object).toPrintable());

Test results:

java.lang.Object object internals:
OFF  SZ   TYPE DESCRIPTION               VALUE
  0   8        (object header: mark)     0x0000000000000001 (non-biasable; age: 0)
  8   4        (object header: class)    0x00002080
 12   4        (object alignment gap)    
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

2116908859
java.lang.Object object internals:
OFF  SZ   TYPE DESCRIPTION               VALUE
  0   8        (object header: mark)     0x0000007e2d773b01 (hash: 0x7e2d773b; age: 0)
  8   4        (object header: class)    0x00002080
 12   4        (object alignment gap)    
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

Result analysis:

The native environment is 64bit widnows. The compressed pointer is enabled, so the first 8 bytes in the object header are mark word, and the next 4 bytes are metadata pointers_ compressed_klass, the last 4 bytes are filled for the object's default 8-byte alignment.

Before calling the hashCode method, the corresponding bit value in mark word is 0. After calling, the calculated default hash value will be filled into the corresponding bit of mark word, as shown in the figure below:

Note: please correct any mistakes or omissions in this article!

Topics: Java object hashcode J2SE