Threadlocal for strong and weak references

Posted by ashly on Mon, 22 Nov 2021 00:39:25 +0100

Start with SimpleDateFormat

Let's first look at an example. When 20 threads are created, one thing is done in the thread, that is, the conversion time

public class ThreadLoaclExample {

    //Non thread safe
    private static final SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

    public static Date parse(String strDate) throws ParseException {
        return sdf.parse(strDate);
    }

    public static void main(String[] args) {
        for (int i = 0; i < 20; i++) {
            new Thread(() -> {
                try {
                    System.out.println(parse("2021-11-18 21:36:17"));
                } catch (ParseException e) {
                    e.printStackTrace();
                }
            }).start();
        }
    }
   
}

Run it and report an error

What is the reason? The reason is that SimpleDateFormat is non thread safe. Click to see the source code of SimpleDateFormat. There is a paragraph written on the class. DateFormat is not synchronous. It is recommended to create an independent format instance for each thread. If multiple threads want to access at the same time, a synchronous one must be added externally.

What does this paragraph mean? There are two ways to solve this problem. One is to add synchronized. The code is as follows:

public static synchronized Date parse(String strDate) throws ParseException {
    return sdf.parse(strDate);
}

But doing so will definitely reduce performance. Another method is to do thread isolation, which is written in the comment. Create a SimpleDateFormat object for each thread, which is unique and unique to the thread, so as not to cause thread safety problems. This requires today's protagonist ThreadLocal. The code is as follows:

public class ThreadLoaclExample {

    private static ThreadLocal<SimpleDateFormat> dateFormatThreadLocal = new ThreadLocal<>();

    private static SimpleDateFormat getDateFormat() {
        SimpleDateFormat dateFormat = dateFormatThreadLocal.get();//Gets a DateFormat from the scope of the current thread
        if (dateFormat == null) {
            dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");//Set a simpleDateFormat object within the scope of the current thread
            //Thread.currentThread();
            dateFormatThreadLocal.set(dateFormat);
        }
        return dateFormat;
    }

    public static Date parse(String strDate) throws ParseException {
        return getDateFormat().parse(strDate);
    }

    public static void main(String[] args) {
        for (int i = 0; i < 20; i++) {
            new Thread(() -> {
                try {
                    System.out.println(parse("2021-11-18 21:36:17"));
                } catch (ParseException e) {
                    e.printStackTrace();
                }
            }).start();
        }
    }
}

Run it and don't report an error

Of course, there is another optimization point above, that is, 20 threads. When there are 1000 threads, each thread has its own independent SimpleDateFormat copy. In this way, 1000 SimpleDateFormat objects will be created, which will be a waste of space. Therefore, it is rewritten into a thread pool:

public static void main(String[] args) {
    ExecutorService executorService = Executors.newFixedThreadPool(16);
    for (int i = 0; i < 1000; i++) {
        executorService.execute(() -> {
            try {
                System.out.println(parse("2021-11-18 21:36:17"));
            } catch (ParseException e) {
                e.printStackTrace();
            }
        });
    }
}

This has the advantage that 1000 tasks can be completed with 16 SimpleDateFormat objects.

The above is the first typical scenario suitable for using ThreadLocal.

The second scenario

The second function is to serve as a context. In an application scenario, when a request comes, service-1 calculates the user information, and the subsequent methods service-2, service-3 and service-4 all need user information. At this time, the method is to take the user as a parameter and keep passing it back. This leads to very redundant code.

One solution is to put the user information in memory, such as hashmap, so that service-1 can put the user information in, and service-2, service-3 and service-4 can get the user information directly, so as to avoid constantly transmitting the user as a parameter.

Then there will be another thread concurrency safety problem. When multiple threads request access at the same time? Then we should use synchronized or concurrent hashmap to ensure the security of hashmap, which has an impact on the performance.

The final solution is to use ThreadLocal, which enables each thread to enjoy its own user information and ensure thread safety. When using it, you can just save it in service-1 and take it out of service-2, service-3 and service-4.

This is the second function. It plays the role of context. UserContextHolder avoids passing parameters.

Storage location of ThreadLocal

First, let's look at the storage locations of Thread, ThreadLocal and ThreadLocalMap.

In the Thread class, there is a ThreadLocalMap variable, as shown in the figure below, because it exists in the Thread, so that the Thread can be unique.

There are many entries in ThreadLocalMap. The key of this Entry is the threadlocal of weak reference, and the value is the value to be stored.

Why are there multiple entries in ThreadLocalMap? Because we can define multiple threadlocales when we use them, and the final storage of these values is entries one by one.

With the above macro experience, let's look at the source code analysis. First, let's look at the set method:

public void set(T value) {
   //Get the current thread to ensure isolation
    Thread t = Thread.currentThread();
    //The ThreadLocalMap is obtained according to the thread. If it is not initialized, it will be initialized
    ThreadLocalMap map = getMap(t);
    //If the map is not empty, set the value
    if (map != null)
        map.set(this, value);
    else   //Otherwise, create a map
        createMap(t, value);
}

If the map is empty, create it first

The initialization process is also relatively simple. Create a new array, calculate the position according to the hash value, and then put the key and value in the position

ThreadLocalMap(ThreadLocal<?> firstKey, Object firstValue) {
    //Array with default length of 16
    table = new Entry[INITIAL_CAPACITY];
    //Calculate array subscript
    int i = firstKey.threadLocalHashCode & (INITIAL_CAPACITY - 1);
    //Put the key and value in the position of i
    table[i] = new Entry(firstKey, firstValue);
    size = 1;
    setThreshold(INITIAL_CAPACITY);
}

Let's take another look at the map.set method. When setting, we also calculate the location first. If there is a value on the location, that is, my previous key, replace the value. If it is null, execute the replacestateentry method, otherwise move to the next location.

private void set(ThreadLocal<?> key, Object value) {

    // We don't use a fast path as with get() because it is at
    // least as common to use set() to create new entries as
    // it is to replace existing ones, in which case, a fast
    // path would fail more often than not.

    Entry[] tab = table;
    int len = tab.length;
    //Calculate array subscript
    int i = key.threadLocalHashCode & (len-1);

    //Linear detection
    for (Entry e = tab[i];
         e != null;
         e = tab[i = nextIndex(i, len)]) {
        ThreadLocal<?> k = e.get();
        //i position already has a value. Replace it directly
        if (k == key) {
            e.value = value;
            return;
        }
        //If key==null, replace the empty array with replacestateentry
        if (k == null) {
            replaceStaleEntry(key, value, i);
            return;
        }
    }

    tab[i] = new Entry(key, value);
    int sz = ++size;
    if (!cleanSomeSlots(i, sz) && sz >= threshold)
        rehash();
}

We know that when a Hashmap conflicts, it uses the zipper method (also known as the chain address method), and our ThreadLocalMap uses the linear detection method. If a conflict occurs, it will not chain down in the form of a linked list, but will continue to look for the next empty lattice. Interested partners can have a look Detailed introduction to the source code of ConcurrentHashMap .

Let's take another look at the get method. This method is also very simple. First get the ThreadLocalMap from the thread, and then pass this into the map as the key to get the Entry, and then get the value from the Entry.

public T get() {
    //Get current thread
    Thread t = Thread.currentThread();
    //Gets the ThreadLocalMap object in the current thread
    ThreadLocalMap map = getMap(t);
    if (map != null) {
        //Get the Entry object in ThreadLocalMap and get the Value
        ThreadLocalMap.Entry e = map.getEntry(this);
        if (e != null) {
            @SuppressWarnings("unchecked")
            T result = (T)e.value;
            return result;
        }
    }
    //If ThreadLocalMap has not been created before in the thread, it will be created
    return setInitialValue();
}

If the map is empty, perform the initialization operation setInitialValue, which is the same as the logic in the set method above.

private T setInitialValue() {
    T value = initialValue();
    Thread t = Thread.currentThread();
    ThreadLocalMap map = getMap(t);
    if (map != null)
        map.set(this, value);
    else
        createMap(t, value);
    return value;
}

Strong and weak references

Strong references and weak references have been mentioned in the title. In addition, the key in the Entry mentioned above is a weak reference to ThreadLocal. What is a strong reference and what is a weak reference are introduced here.

Let's first look at the strongly referenced Code:

public class ReferenceExample {

    static Object object = new Object();

    public static void main(String[] args) {
        Object strongRef = object;
        object = null;
        System.gc();
        System.out.println(strongRef);
    }
}

Run it and it hasn't been recycled

I drew a diagram. Let's see. At first, both object and strongRef point to the new Object() object in the heap.

Later, when object = null is executed, the connection between the stack and the heap is broken. Therefore, after System.gc(), strongRef is still connected to new Object(), so it is not released.

Take another look at the code of weak references:

public class ReferenceExample {

    static Object object = new Object();

    public static void main(String[] args) {
        WeakReference<Object> weakRef = new WeakReference<>(object);
        object = null;
        System.gc();
        System.out.println(weakRef.get());
    }
}

After another execution, the result is null and has been recycled

The connection of weak references is very weak. This dotted line is equal to no, which is in vain. When recycling, new Object() looks that no one is referencing, so it is directly recycled, so it is null when printing weakRef.

Therefore, the judgment of knull appears in the source code because ThreadLocal is a weak reference. When we execute the ThreadLocal instance = null operation in the business code, we want to clean up the ThreadLocal instance. Because it is a weak reference, just like the above example, the key will become null after garbage collection, This Entry cannot be occupied in the array all the time, so it will clean up the keynull.

Little friends who don't know much about garbage collection can have a look Understanding GC garbage collection in an article .

Memory leak / remove() method

First, when you run out of ThreadLocal, you must call the remove() method! Be sure to call the remove() method! Be sure to call the remove() method! Otherwise, it will cause memory leakage.

Memory leak means that when an object is no longer useful, the occupied memory can not be recycled, which is called memory leak.

Key leakage

As mentioned above, the key is a weak reference. If it is a strong reference, when ThreadLocal instance = null is executed, the key is still referencing threadlocal. At this time, the memory will not be released. Then the Entry will always exist in the array and will not be cleaned up. The more the heap is.

However, if a weak reference is used, the key will become null. JDK helps us consider this. When we execute the get, set, remove, rehash and other methods of ThreadLocal, it will scan the Entry with null key. If the key of an Entry is found to be null, it means that its corresponding value has no effect, so it will set the corresponding value to null, The value object can be recycled normally to prevent memory leakage.

Disclosure of value

Although the disclosure of key is solved, we know that value is a strong reference. Let's take a look at the following call chain:

Thread Ref → Current Thread → ThreadLocalMap → Entry → value → possible leaked value instances.

This link always exists with the existence of the thread. If the thread executes time-consuming tasks without stopping, and the get, set, remove and rehash methods of ThreadLocal are not called, the memory pointed to by this value always exists and occupies. To solve this situation, use the remove method. Look at the source code:

public void remove() {
    ThreadLocalMap m = getMap(Thread.currentThread());
    if (m != null)
        m.remove(this);
}

There is also a danger. If the thread is a thread pool, it does not end when the thread executes the code, but only returns it to the thread pool, then the value in the thread will always be occupied and cannot be recycled, resulting in memory leakage. Therefore, we should form a good habit in coding. When we no longer use ThreadLocal, we should call the remove() method to release memory in time. Finally, thank you for watching~

Topics: Java jvm Concurrent Programming lock

Programmer Think