Why is FastThreadLocal in Netty more efficient than ThreadLocal?

Posted by RunningUtes on Tue, 23 Nov 2021 05:36:59 +0100

Before reading this article, it is recommended to read the content associated with this article.

1. Analyze the underlying implementation principle of network communication under distributed micro service architecture in detail (illustration)

2. (skills with an annual salary of 60W) after working for 5 years, do you really understand Netty and why to use it? (deep dry goods)

3. Deeply analyze the core components in Netty (illustration + example)

4. Details required for bat interview: detailed explanation of ByteBuf in Netty

5. Through a large number of practical cases, how to solve the problem of unpacking and sticking in Netty?

6. Implement custom message communication protocol based on Netty (protocol design and analysis application practice)

7. The most detailed and complete serialization technology, in-depth analysis and application practice in the whole network

8. Teach you how to implement a basic RPC framework based on Netty (easy to understand)

9. (annual salary 60W watershed) advanced part of RPC framework based on Netty handwriting (with registration center and notes)

The implementation of FastThreadLocal is very similar to ThreadLocal in J.U.C package.

Students who have understood the ThreadLocal principle should know that it has several key objects

  1. Thread
  2. ThreadLocalMap
  3. ThreadLocal

Similarly, Netty has specially created two important classes FastThreadLocalThread and InternalThreadLocalMap for FastThreadLocal. Let's take a look at how these two classes are implemented.

PS, if you don't understand ThreadLocal, you can read my article: Application and principle analysis of ThreadLocal

FastThreadLocalThread is a layer of packaging for the Thread class. Each Thread corresponds to an InternalThreadLocalMap instance. Only when FastThreadLocal and FastThreadLocalThread are used in combination can the performance advantages of FastThreadLocal be brought into play. First, let's look at the source code definition of FastThreadLocalThread:

public class FastThreadLocalThread extends Thread {

    private InternalThreadLocalMap threadLocalMap;
    // Omit other codes
}

It can be seen that FastThreadLocalThread mainly extends the InternalThreadLocalMap field. We can guess that FastThreadLocalThread mainly uses InternalThreadLocalMap to store data instead of ThreadLocalMap in Thread. Therefore, if you want to know the secret of high performance of FastThreadLocalThread, you must understand the design principle of InternalThreadLocalMap.

InternalThreadLocalMap

public final class InternalThreadLocalMap extends UnpaddedInternalThreadLocalMap {

    private static final int DEFAULT_ARRAY_LIST_INITIAL_CAPACITY = 8;

    private static final int STRING_BUILDER_INITIAL_SIZE;

    private static final int STRING_BUILDER_MAX_SIZE;

    public static final Object UNSET = new Object();

    private BitSet cleanerFlags;
    private InternalThreadLocalMap() {
        indexedVariables = newIndexedVariableTable();
    }
    private static Object[] newIndexedVariableTable() {
        Object[] array = new Object[INDEXED_VARIABLE_TABLE_INITIAL_SIZE];
        Arrays.fill(array, UNSET);
        return array;
    }
    public static int lastVariableIndex() {
        return nextIndex.get() - 1;
    }

    public static int nextVariableIndex() {
        int index = nextIndex.getAndIncrement();
        if (index < 0) {
            nextIndex.decrementAndGet();
            throw new IllegalStateException("too many thread-local indexed variables");
        }
        return index;
    }
    // ellipsis

}

From the internal implementation of InternalThreadLocalMap, the same as ThreadLocalMap uses array storage.

Students who know ThreadLocal know that it also uses array to implement hash table internally. For hash conflict, it uses linear exploration.

However, InternalThreadLocalMap does not use the linear detection method to solve Hash conflicts. Instead, an array index index is allocated during FastThreadLocal initialization. The index value is incremented in order by using the atomic class AtomicInteger, which is obtained by calling the InternalThreadLocalMap.nextVariableIndex() method. Then, when reading and writing data, directly locate the position of FastThreadLocal through the array subscript index, and the time complexity is O(1). If the array subscript is incremented to very large, the array will also be relatively large. Therefore, FastThreadLocal improves the read and write performance by exchanging space for time.

The following figure describes the relationship among InternalThreadLocalMap, index and FastThreadLocal.

Through the internal structure diagram of FastThreadLocal above, what are the differences between FastThreadLocal and ThreadLocal?

FastThreadLocal uses the object array instead of the Entry array. Object[0] stores a set < FastThreadLocal <? > > Set.

The value data is directly stored from the array subscript 1, and is no longer stored in the form of ThreadLocal key value pairs.

Suppose we have a batch of data to be added to the array, which are value1, value2, value3 and value4 respectively. The array indexes generated by the corresponding FastThreadLocal during initialization are 1, 2, 3 and 4 respectively. As shown in the figure below.

So far, we have a basic understanding of FastThreadLocal. Let's analyze the implementation principle of FastThreadLocal in combination with the specific source code.

Source code analysis of FastThreadLocal set method

Before explaining the source code, let's look back at the ThreadLocal example above. If ThreadLocal in the example is replaced by FastThread, how should it be used?

public class FastThreadLocalTest {

    private static final FastThreadLocal<String> THREAD_NAME_LOCAL = new FastThreadLocal<>();
    private static final FastThreadLocal<TradeOrder> TRADE_THREAD_LOCAL = new FastThreadLocal<>();
    public static void main(String[] args) {
        for (int i = 0; i < 2; i++) {
            int tradeId = i;
            String threadName = "thread-" + i;
            new FastThreadLocalThread(() -> {
                THREAD_NAME_LOCAL.set(threadName);
                TradeOrder tradeOrder = new TradeOrder(tradeId, tradeId % 2 == 0 ? "Paid" : "Unpaid");
                TRADE_THREAD_LOCAL.set(tradeOrder);
                System.out.println("threadName: " + THREAD_NAME_LOCAL.get());
                System.out.println("tradeOrder info: " + TRADE_THREAD_LOCAL.get());
            }, threadName).start();

        }
    }
}

It can be seen that the use method of FastThreadLocal is almost the same as that of ThreadLocal. You only need to replace Thread and ThreadLocal in the code with FastThreadLocalThread and FastThreadLocal. Netty has done a great job in ease of use. Next, we focus on the in-depth analysis of the FastThreadLocal.set()/get() method used in the example.

First, take a look at the source code of FastThreadLocal.set():

public final void set(V value) {
    if (value != InternalThreadLocalMap.UNSET) {
        InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
        setKnownNotUnset(threadLocalMap, value);
    } else {
        remove();
    }
}

The implementation of FastThreadLocal.set() method is not difficult to understand. First grasp the code trunk and disassemble and analyze it step by step. The process of set() is mainly divided into three steps:

  1. Judge whether value is the default value. If it is equal to the default value, call the remove() method directly. Here, we don't know the relationship between the default value and remove(). Let's put remove() in the final analysis.
  2. If value is not equal to the default value, the InternalThreadLocalMap of the current thread will be obtained next.
  3. Then replace the corresponding data in the InternalThreadLocalMap with a new value.

InternalThreadLocalMap.get()

Let's first look at the InternalThreadLocalMap.get() method:

public static InternalThreadLocalMap get() {
    Thread thread = Thread.currentThread();
    if (thread instanceof FastThreadLocalThread) {
        return fastGet((FastThreadLocalThread) thread);
    } else {
        return slowGet();
    }
}

fastGet() is called if the thread instance type is FastThreadLocalThread.

The logic of InternalThreadLocalMap.get() is very simple

  1. If the current thread is of FastThreadLocalThread type, you can directly obtain the threadLocalMap property of FastThreadLocalThread through fastGet() method
  2. If the InternalThreadLocalMap does not exist at this time, create a return directly.

The initialization of InternalThreadLocalMap has been described above. It initializes an Object array with a length of 32, which is filled with references to 32 default objects UNSET.

private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {
  InternalThreadLocalMap threadLocalMap = thread.threadLocalMap();
  if (threadLocalMap == null) {
    thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap());
  }
  return threadLocalMap;
}

Otherwise, call slowGet(). From the code implementation point of view, slowGet() is a thorough scheme for calling threads of non FastThreadLocalThread type. If the current thread is not FastThreadLocalThread, there is no InternalThreadLocalMap attribute inside. Netty saves a JDK native ThreadLocal in the UnpaddedInternalThreadLocalMap, and the InternalThreadLocalMap is stored in the ThreadLocal. At this time, obtaining the InternalThreadLocalMap degenerates into obtaining the JDK native ThreadLocal.

private static InternalThreadLocalMap slowGet() {
  InternalThreadLocalMap ret = slowThreadLocalMap.get();
  if (ret == null) {
    ret = new InternalThreadLocalMap();
    slowThreadLocalMap.set(ret);
  }
  return ret;
}

setKnownNotUnset

The process of obtaining InternalThreadLocalMap has been finished. Let's take a look at how setKnownNotUnset() adds data to InternalThreadLocalMap.

private void setKnownNotUnset(InternalThreadLocalMap threadLocalMap, V value) {
    if (threadLocalMap.setIndexedVariable(index, value)) {
        addToVariablesToRemove(threadLocalMap, this);
    }
}

setKnownNotUnset() mainly does two things:

  1. Find the index position of the array subscript and set a new value.
  2. Save the FastThreadLocal object to the Set to be cleaned up.

First, let's look at the source code implementation of threadLocalMap.setIndexedVariable()

public boolean setIndexedVariable(int index, Object value) {
    Object[] lookup = indexedVariables;
    if (index < lookup.length) {
        Object oldValue = lookup[index];
        lookup[index] = value;
        return oldValue == UNSET;
    } else {
        expandIndexedVariableTableAndSet(index, value);
        return true;
    }
}

indexedVariables is the array used to store data in InternalThreadLocalMap. If the array capacity is greater than the index of FastThreadLocal, directly find the index position of the array subscript and set the new value. The event complexity is O(1). Before setting a new value, the element at the previous index position will be taken out. If the old element is still the UNSET default object, success will be returned.

What if the array capacity is not enough? InternalThreadLocalMap will be automatically expanded, and then set value. Next, look at the expansion logic of expandIndexedVariableTableAndSet():

private void expandIndexedVariableTableAndSet(int index, Object value) {
    Object[] oldArray = indexedVariables;
    final int oldCapacity = oldArray.length;
    int newCapacity = index;
    newCapacity |= newCapacity >>>  1;
    newCapacity |= newCapacity >>>  2;
    newCapacity |= newCapacity >>>  4;
    newCapacity |= newCapacity >>>  8;
    newCapacity |= newCapacity >>> 16;
    newCapacity ++;

    Object[] newArray = Arrays.copyOf(oldArray, newCapacity);
    Arrays.fill(newArray, oldCapacity, newArray.length, UNSET);
    newArray[index] = value;
    indexedVariables = newArray;
}

We can see that as like as two peas in InternalThreadLocalMap, the expansion of arrays is almost the same as that of HashMap, so reading more source code still gives us a lot of inspiration. Internalthreadlocalmap expands the capacity based on index, and rounds the capacity of the array after expansion to the power of 2. Then copy the contents of the original array to the new array, fill the empty part with the default object UNSET, and finally assign the new array to indexedVariables.

Thinking about benchmark expansion

Thinking: why does the InternalThreadLocalMap expand based on the index instead of the original array length?

Suppose that 70 fastthreadlocales have been initialized, but these fastthreadlocales have never called the set () method. At this time, the default length of the array is 32. When FastThreadLocal with index = 70 calls the set() method, the data with index = 70 cannot be filled after the capacity of the original array is expanded twice by 32. Therefore, using index as the benchmark for capacity expansion can solve this problem, but if there are too many FastThreadLocal, the length of the array is also very large.

Return to the main process of setKnownNotUnset(). After adding data to the InternalThreadLocalMap, the next step is to save the FastThreadLocal object to the Set to be cleaned up. Let's continue to see how addToVariablesToRemove() is implemented:

addToVariablesToRemove

private static void addToVariablesToRemove(InternalThreadLocalMap threadLocalMap, FastThreadLocal<?> variable) {
    Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex);
    Set<FastThreadLocal<?>> variablesToRemove;
    if (v == InternalThreadLocalMap.UNSET || v == null) {
        variablesToRemove = Collections.newSetFromMap(new IdentityHashMap<FastThreadLocal<?>, Boolean>());
        threadLocalMap.setIndexedVariable(variablesToRemoveIndex, variablesToRemove);
    } else {
        variablesToRemove = (Set<FastThreadLocal<?>>) v;
    }

    variablesToRemove.add(variable);
}

variablesToRemoveIndex is a variable decorated with static final. During FastThreadLocal initialization, variablesToRemoveIndex is assigned to 0. InternalThreadLocalMap will first find the elements with array subscript 0

  1. If the element is UNSET or does not exist, a Set set Set of FastThreadLocal type will be created, and then the Set set will be filled to the position of array subscript 0.
  2. If the first element of the array is not the default object UNSET, it indicates that the Set set has been filled, and you can directly forcibly obtain the Set set. This explains why the value data of InternalThreadLocalMap is stored from the position with subscript 1, because the position of 0 has been occupied by the Set set.

Thinking about Set design

Thinking: why does InternalThreadLocalMap store a Set set of FastThreadLocal type in the position where the index of the array is 0? At this point, let's look back at the remove() method.

public final void remove(InternalThreadLocalMap threadLocalMap) {
  if (threadLocalMap == null) {
    return;
  }

  Object v = threadLocalMap.removeIndexedVariable(index);
  removeFromVariablesToRemove(threadLocalMap, this);

  if (v != InternalThreadLocalMap.UNSET) {
    try {
      onRemoval((V) v);
    } catch (Exception e) {
      PlatformDependent.throwException(e);
    }
  }
}

Before the remove operation, InternalThreadLocalMap.getIfSet() will be called to get the current InternalThreadLocalMap.

With the previous foundation, understanding the getIfSet() method is very simple.

  1. If it is FastThreadLocalThread type, directly take the threadLocalMap attribute in FastThreadLocalThread.
  2. If it is an ordinary Thread, get it from the slowThreadLocalMap of ThreadLocal type.

After finding the InternalThreadLocalMap, the InternalThreadLocalMap will locate the element at the subscript index position from the array and overwrite the element at the index position with the default object UNSET.

Next, you need to clean up the current FastThreadLocal object. At this time, the Set set is used. InternalThreadLocalMap will take out the Set set at the subscript 0 of the array, and then delete the current FastThreadLocal. Finally, what does the onRemoval() method do? Netty only leaves one extension but does not implement it. Users need to do some post operations when deleting. They can inherit FastThreadLocal to implement this method.

FastThreadLocal.get() source code analysis

Let's look at the source code of FastThreadLocal.get():

public final V get() {
    InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
    Object v = threadLocalMap.indexedVariable(index);
    if (v != InternalThreadLocalMap.UNSET) {
        return (V) v;
    }

    return initialize(threadLocalMap);
}

First, find the InternalThreadLocalMap according to whether the current thread is FastThreadLocalThread type, and then take out the element subscript index from the array. If the element at the index position is not the default object UNSET, it indicates that the position has been filled with data, and then take it out and return directly.

public Object indexedVariable(int index) {
  Object[] lookup = indexedVariables;
  return index < lookup.length? lookup[index] : UNSET;
}

If the element at the index position is the default object UNSET, you need to perform initialization. You can see that the initialize() method will call the initialValue method overridden by the user to construct the object data to be stored

private V initialize(InternalThreadLocalMap threadLocalMap) {
    V v = null;
    try {
        v = initialValue();
    } catch (Exception e) {
        PlatformDependent.throwException(e);
    }

    threadLocalMap.setIndexedVariable(index, v);
    addToVariablesToRemove(threadLocalMap, this);
    return v;
}

The initialValue method is constructed as follows.

private final FastThreadLocal<String> threadLocal = new FastThreadLocal<String>() {
  @Override
  protected String initialValue() {
    return "hello world";
  }
};

After constructing the user object data, it will be filled into the position of the array index, and then the current FastThreadLocal object will be saved to the set to be cleaned up. We have introduced the whole process when analyzing FastThreadLocal.set(), so we won't repeat it.

So far, we have analyzed the two core methods set()/get() of FastThreadLocal. Here are two questions, let's think more deeply.

  1. Is FastThreadLocal really faster than ThreadLocal? The answer is not necessarily. Only threads of FastThreadLocalThread type will be faster. If they are ordinary threads, they will be slower.
  2. Will FastThreadLocal waste a lot of space? Although FastThreadLocal adopts the idea of space for time, at the beginning of FastThreadLocal design, it is considered that there will not be too many FastThreadLocal objects, and the elements not used in the data only store the reference of the same default object and will not occupy too much memory space.

Copyright notice: unless otherwise stated, all articles on this blog adopt CC BY-NC-SA 4.0 license agreement. Reprint please indicate from Mic to take you to learn architecture!
If this article is helpful to you, please pay attention and praise. Your persistence is the driving force of my continuous creation. Welcome to WeChat public official account for more dry cargo.

Topics: Java