On the advantages and implementation of FastThreadLocal in Netty

Posted by wizhippo on Sat, 30 Oct 2021 13:00:48 +0200

catalogue

Why is FastThreadLocal faster than ThreadLocal

ThreadLocal

ThreadLocalMap

Linear detection method

FastThreadLocal

FastThreadLocalThread

InternalThreadLocalMap

  FastThreadLocal.set method

FastThreadLocal.get method  

 

Using FastThreadLocal based on FastThreadLocalThread is more efficient than using Thread Local for JDK threads.

Why is FastThreadLocal faster than ThreadLocal

The internal storage structure of FastThreadLocal is different from ThreadLocal. FastThreadLocal is based on data. Compared with ThreadLocal, it has more station space, but it makes the search efficiency higher. While ThreadLocal is based on hash table. When there is hash conflict, the search efficiency based on linear table is obviously lower than that based on array index.

ThreadLocal

First, let's review ThreadLocal. ThreadLocal is provided in JDK to store Thread private data. In order to avoid locking, the implementation starts with the Thread, maintains a Map in the Thread, and records the mapping relationship between ThreadLocal and the instance, so that the Map does not need to be locked in the same Thread.

ThreadLocalMap

The internal storage of ThreadLocal is based on ThreadLocalMap. In fact, ThreadLocalMap is similar to the data structure of HashMap. It is a hash table implemented by linear detection method, and the bottom layer uses array to store data. ThreadLocalMap initializes an Entry array with a length of 16. Each Entry object is used to store key value pairs. Unlike HashMap, the key of the Entry is the ThreadLocal object itself, and the value is the value that the user needs to store.

When ThreadLocal.set() is called to add an Entry object, how to solve the Hash conflict? Here is the answer

Linear detection method

Each ThreadLocal will have a Hash value of threadLocalHashCode during initialization. Each time a ThreadLocal is added, a magic Hash will be added to the Hash value_ INCREMENT = 0x61c88647. Why take 0x61c88647 this magic number? Experiments show that by 0x61c88647 accumulating the generated threadLocalHashCode and the power of 2, the results can be evenly distributed in the power array with length of 2.

The process of ThreadLocal.get() is similar. Locate the array subscript according to the value of threadLocalHashCode, and then judge whether the key of the current Entry object is the same as that of the Entry object to be queried. If it is different, continue to look down. It can be seen that ThreadLocal.set()/get() method is prone to Hash conflict when the data is dense. It requires O(n) time complexity to solve the conflict problem and is inefficient.

FastThreadLocal

The implementation of FastThreadLocal is very similar to ThreadLocal. Netty has tailored two important classes for FastThreadLocal, FastThreadLocalThread and InternalThreadLocalMap.

FastThreadLocalThread

FastThreadLocalThread is a layer of packaging for the Thread class. Each Thread corresponds to an InternalThreadLocalMap instance. Only when FastThreadLocal and FastThreadLocalThread are used in combination can the performance advantages of FastThreadLocal be brought into play. FastThreadLocalThread mainly extends the InternalThreadLocalMap field. We can guess that FastThreadLocalThread mainly uses InternalThreadLocalMap to store data instead of ThreadLocalMap in Thread.

InternalThreadLocalMap

From the internal implementation of InternalThreadLocalMap, the same as ThreadLocalMap uses array storage. However, InternalThreadLocalMap does not use the linear detection method to solve Hash conflicts. Instead, an array index index is allocated during FastThreadLocal initialization. The index value is incremented in order by using the atomic class AtomicInteger, which is obtained by calling the InternalThreadLocalMap.nextVariableIndex() method. Then, when reading and writing data, directly locate the position of FastThreadLocal through the array subscript index, and the time complexity is O(1). If the array subscript is incremented to very large, the array will also be relatively large. Therefore, FastThreadLocal improves the read and write performance by exchanging space for time.

  FastThreadLocal.set method

public final void set(V value) {
    if (value != InternalThreadLocalMap.UNSET) { // 1. Is value the default value
        InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get(); // 2. Get the InternalThreadLocalMap of the current thread
        setKnownNotUnset(threadLocalMap, value); // 3. Replace the data in InternalThreadLocalMap with a new value
    } else {
        remove();

    }
}

Although the FastThreadLocal.set() method has only a few lines of code at the entry, its internal logic is quite complex. First of all, we should grasp the code trunk and disassemble and analyze it step by step. The process of set() is mainly divided into three steps:

  1. Judge whether value is the default value. If it is equal to the default value, call the remove() method directly. Here, we don't know the relationship between the default value and remove(). Let's put remove() in the final analysis.

  2. If value is not equal to the default value, the InternalThreadLocalMap of the current thread will be obtained next.

  3. Then replace the corresponding data in the InternalThreadLocalMap with a new value.  

First, let's look at the InternalThreadLocalMap.get() method

  1. If the current thread is of FastThreadLocalThread type, you can directly obtain the threadLocalMap property of FastThreadLocalThread through fastGet() method. If the InternalThreadLocalMap does not exist at this time, create a return directly.
  2. If the current thread is not FastThreadLocalThread, there is no InternalThreadLocalMap attribute inside. Netty saves a JDK native ThreadLocal in the UnpaddedInternalThreadLocalMap, and the InternalThreadLocalMap is stored in the ThreadLocal. At this time, obtaining the InternalThreadLocalMap degenerates into obtaining the JDK native ThreadLocal.
public static InternalThreadLocalMap get() {
    Thread thread = Thread.currentThread();
    if (thread instanceof FastThreadLocalThread) { // Is the current thread of type FastThreadLocalThread
        return fastGet((FastThreadLocalThread) thread);
    } else {
        return slowGet();
    }
}

private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {
    InternalThreadLocalMap threadLocalMap = thread.threadLocalMap(); // Gets the threadLocalMap property of FastThreadLocalThread
    if (threadLocalMap == null) {
        thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap());
    }
    return threadLocalMap;
}
private static InternalThreadLocalMap slowGet() {
    ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = UnpaddedInternalThreadLocalMap.slowThreadLocalMap; 
    InternalThreadLocalMap ret = slowThreadLocalMap.get(); // Get InternalThreadLocalMap from JDK native ThreadLocal
    if (ret == null) {
        ret = new InternalThreadLocalMap();
        slowThreadLocalMap.set(ret);
    }
    return ret;
}

  The following figure is used to describe the acquisition method of InternalThreadLocalMap

Let's take a look at how setKnownNotUnset() adds data to the of InternalThreadLocalMap.

setKnownNotUnset() mainly does two things:

  1. Find the index position of the array subscript and set a new value.

  2. Save the FastThreadLocal object to the Set to be cleaned up.

private void setKnownNotUnset(InternalThreadLocalMap threadLocalMap, V value) {
    // 1. Find the index position of the array subscript and set a new value
    if (threadLocalMap.setIndexedVariable(index, value)) { 
    // 2. Save the FastThreadLocal object to the Set to be cleaned up
        addToVariablesToRemove(threadLocalMap, this); 
    }
}

First, let's look at the source code of threadLocalMap.setIndexedVariable() in the next step  

public boolean setIndexedVariable(int index, Object value) {
    Object[] lookup = indexedVariables;
    if (index < lookup.length) {
        Object oldValue = lookup[index]; 
        lookup[index] = value; // Directly set the array index position to value, and the time complexity is O(1)
        return oldValue == UNSET;
    } else {
        expandIndexedVariableTableAndSet(index, value); // The capacity is insufficient. Expand the capacity first and then set the value
        return true;
    }
}

indexedVariables is the array used to store data in InternalThreadLocalMap. If the array capacity is greater than the index of FastThreadLocal, directly find the index position of the array subscript and set the new value. The event complexity is O(1). Before setting a new value, the element at the previous index position will be taken out. If the old element is still the UNSET default object, success will be returned.

What if the array capacity is not enough? InternalThreadLocalMap will be automatically expanded, and then set value. Next, look at the expansion logic of expandIndexedVariableTableAndSet()

If you read as like as two peas in HashMap, the I will be expanded. The nternalThreadLocalMap will almost exactly be the same as HashMap. InternalThreadLocalMap expands the capacity based on index, and rounds the capacity of the array after expansion to the power of 2. Then copy the contents of the original array to the new array, fill the empty part with the default object UNSET, and finally assign the new array to indexedVariables. Why does the InternalThreadLocalMap expand based on the index instead of the original array length? Suppose that 70 fastthreadlocales have been initialized, but these fastthreadlocales have never called the set () method. At this time, the default length of the array is 32. When FastThreadLocal with index = 70 calls the set() method, the data with index = 70 cannot be filled after the capacity of the original array is expanded twice by 32. Therefore, using index as the benchmark for capacity expansion can solve this problem, but if there are too many FastThreadLocal, the length of the array is also very large.

private void expandIndexedVariableTableAndSet(int index, Object value) {
    Object[] oldArray = indexedVariables;
    final int oldCapacity = oldArray.length;
    int newCapacity = index;
    newCapacity |= newCapacity >>>  1;
    newCapacity |= newCapacity >>>  2;
    newCapacity |= newCapacity >>>  4;
    newCapacity |= newCapacity >>>  8;
    newCapacity |= newCapacity >>> 16;
    newCapacity ++;
    Object[] newArray = Arrays.copyOf(oldArray, newCapacity);
    Arrays.fill(newArray, oldCapacity, newArray.length, UNSET);
    newArray[index] = value;
    indexedVariables = newArray;
}

After adding data to the InternalThreadLocalMap, the next step is to save the FastThreadLocal object to the Set to be cleaned up. Let's continue to see how addToVariablesToRemove() is implemented.  

private static void addToVariablesToRemove(InternalThreadLocalMap threadLocalMap, FastThreadLocal<?> variable) {
    // Gets the element with array subscript 0
    Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex); 
    Set<FastThreadLocal<?>> variablesToRemove;
    if (v == InternalThreadLocalMap.UNSET || v == null) {
    // Create a Set collection of FastThreadLocal type
        variablesToRemove = Collections.newSetFromMap(new IdentityHashMap<FastThreadLocal<?>, Boolean>()); 
    // Populates the Set collection to the position of the array subscript 0
        threadLocalMap.setIndexedVariable(variablesToRemoveIndex, variablesToRemove); 
    } else {
    // If it is not UNSET, the Set set already exists, and the Set set is obtained directly
        variablesToRemove = (Set<FastThreadLocal<?>>) v; 
    }
    // Add FastThreadLocal to the Set collection
    variablesToRemove.add(variable); 
}

variablesToRemoveIndex is a variable decorated with static final. During FastThreadLocal initialization, variablesToRemoveIndex is assigned to 0. InternalThreadLocalMap will first find the element with array subscript 0. If the element is the default object UNSET or does not exist, it will create a Set set of FastThreadLocal type, and then fill the Set set to the position of array subscript 0. If the first element of the array is not the default object UNSET, it indicates that the Set set has been filled, and you can directly forcibly obtain the Set set. This explains why the value data of InternalThreadLocalMap is stored from the position with subscript 1, because the position of 0 has been occupied by the Set set.  

FastThreadLocal.get method  

public final V get() {
    InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
    // Extract the element at index position from the array
    Object v = threadLocalMap.indexedVariable(index); 
    if (v != InternalThreadLocalMap.UNSET) {
        return (V) v;
    }
    // If the obtained array element is the default object, perform initialization
    return initialize(threadLocalMap); 
}

public Object indexedVariable(int index) {
    Object[] lookup = indexedVariables;
    return index < lookup.length? lookup[index] : UNSET;
}

private V initialize(InternalThreadLocalMap threadLocalMap) {
    V v = null;
    try {
        v = initialValue();
    } catch (Exception e) {
        PlatformDependent.throwException(e);
    }
    threadLocalMap.setIndexedVariable(index, v);
    addToVariablesToRemove(threadLocalMap, this);
    return v;
}

First, find the InternalThreadLocalMap according to whether the current thread is FastThreadLocalThread type, and then take out the element subscript index from the array. If the element at the index position is not the default object UNSET, it indicates that the position has been filled with data, and then take it out and return directly. If the element at the index position is the default object UNSET, you need to perform initialization. You can see that the initialize() method will call the initialValue method overridden by the user to construct the object data to be stored, as shown below.

private final FastThreadLocal<String> threadLocal = new FastThreadLocal<String>() {
    @Override
    protected String initialValue() {
        return "hello world";
    }
};

After constructing the user object data, it will be filled into the position of the array index, and then the current FastThreadLocal object will be saved to the set to be cleaned up. We have introduced the whole process when analyzing FastThreadLocal.set(), so we won't repeat it.  

Topics: Java Netty