Initialization and implementation of HashMap capacity

Posted by Fritz.fx on Tue, 14 Dec 2021 23:16:18 +0100

HashMap initializes the collection using HashMap(int initialCapacity).

By default, the capacity of HashMap is 16. However, if the user specifies a number as the capacity through the constructor, Hash will select a power greater than the first 2 of the number as the capacity. For example, if 3 is specified, the capacity is 4; If 7 is specified, the capacity is 8; If 9 is specified, the capacity is 16.

Why set the initialization capacity of HashMap
Let's understand why this suggestion is made through specific code.

Let's write a piece of code in jdk1 7 to test the performance without specifying the initialization capacity and specifying the initialization capacity.

public static void main(String[] args) {
 int aHundredMillion = 10000000;

 // Uninitialized capacity
 Map<Integer, Integer> map = new HashMap<>();
 long s1 = System.currentTimeMillis();
 for (int i = 0; i < aHundredMillion; i++) {
  map.put(i, i);
 }
 long s2 = System.currentTimeMillis();
 System.out.println("Uninitialized capacity, time consuming: " + (s2 - s1)); // 14322

 // The initialization capacity is 50000000
 Map<Integer, Integer> map1 = new HashMap<>(aHundredMillion / 2);
 long s3 = System.currentTimeMillis();
 for (int i = 0; i < aHundredMillion; i++) {
  map1.put(i, i);
 }
 long s4 = System.currentTimeMillis();
 System.out.println("Initialization capacity 5000000, time consuming: " + (s4 - s3)); // 11819

 // The initialization capacity is 100000000
 Map<Integer, Integer> map2 = new HashMap<>(aHundredMillion);
 long s5 = System.currentTimeMillis();
 for (int i = 0; i < aHundredMillion; i++) {
  map2.put(i, i);
 }
 long s6 = System.currentTimeMillis();
 System.out.println("Initialization capacity is 10000000, time consuming: " + (s6 - s5)); // 7978
}

It is not difficult to understand from the above code. We have created three hashmaps, which use the default capacity (16), half of the number of elements (50 million) as the initial capacity and the number of elements (100 million) as the initial capacity, and then put 100 million KV into them respectively.

A preliminary conclusion can be drawn from the above print results: when the number of KV to be stored in HashMap is known, setting a reasonable initialization capacity can effectively improve the performance. Let's briefly analyze the reasons.

As we know, HashMap has a capacity expansion mechanism. The so-called capacity expansion mechanism means that when the capacity expansion conditions are met, HashMap will automatically expand the capacity. The capacity expansion condition of HashMap is that when the number (Size) of elements in HashMap exceeds the Threshold, it will be expanded automatically.

threshold = loadFactor * capacity

When the number of elements exceeds the critical value, HashMap will expand with the continuous increase of elements. The capacity expansion mechanism in HashMap determines that the hash table needs to be rebuilt for each capacity expansion. This operation consumes a lot of resources, which greatly affects the performance. Therefore, if we do not set the initial capacity, HashMap may continue to expand, which will reduce the performance of the program.

In addition, in the above code, we will find that the initialization capacity is also set, and different values will also affect the performance. Therefore, when we know the number of KV to be stored in the HashMap, the capacity setting becomes a problem.

Initialization of capacity in HashMap
As mentioned at the beginning, by default, when we set the initialization capacity of HashMap, HashMap will actually use the first power greater than 2 of this value as the initialization capacity.

Map<String, String> map = new HashMap<>(1);
map.put("huangq", "yanggb");

Class<?> mapType = map.getClass();
Method capacity = mapType.getDeclaredMethod("capacity");
capacity.setAccessible(true);
System.out.println("capacity : " + capacity.invoke(map)); // 2

When the initial capacity is set to 1, the capacity obtained through reflection is 2. At jdk1 In 8, if the initialization capacity we passed in is 1, the result is actually 1. The reason why the print result of the above code is 2 is that the operation of inserting a value into the map in the code leads to the expansion of the capacity from 1 to 2. In fact, in jdk1 7 and jdk1 8, the timing of HashMap initialization capacity (capacity) is different. In JDK1.8, when the constructor of HashMap is called to define HashMap, the capacity is set. In JDK1.7, it is not necessary to wait until the first put operation.

Therefore, when we set the initial capacity through HashMap(int initialCapacity), HashMap does not necessarily directly adopt the value we passed in, but obtains a new value through calculation in order to improve the efficiency of hash. For example, 1 - > 1, 3 - > 4, 7 - > 8 and 9 - > 16.

Reasonable value of initial capacity in HashMap

Through the above analysis, we can know that when we use HashMap(int initialCapacity) to initialize the capacity, JDK will help us calculate a relatively reasonable value as the initial capacity by default. So, do we just need to pass the number of elements to be stored in the known HashMap directly to initialCapacity?

initialCapacity = (Number of elements to store / Load factor) + 1

The load factor here is loaderFactor, and the default value is 0.75.

initialCapacity = expectedSize / 0.75F + 1.0F

summary

When we want to create a HashMap in the code, if we know the number of elements to be stored in the Map, setting the initial capacity of the HashMap can improve the efficiency to a certain extent.

However, the JDK does not directly take the number passed in by the user as the default capacity, but will perform some operation and finally get a power of 2. In order to avoid the performance consumption caused by capacity expansion to the greatest extent, it is generally recommended to set the number of default capacity to expectedSize / 0.75F + 1.0F.

In daily development, you can use a method provided by Guava to create a HashMap, and Guava will help us complete the calculation process.

Map<String, String> map = Maps.newHashMapWithExpectedSize(10);

Finally, this algorithm is actually a method of using memory for performance. In a real application scenario, the impact of memory should be considered.

Topics: Java Database Hibernate