Multithreading - Some ideas for improving lock performance

Posted by nikkio3000 on Sat, 27 Jul 2019 19:39:38 +0200

Locks are one of the most common synchronization methods. In high concurrent environments, intense lock competition can cause performance degradation of programs. Therefore, it is necessary to discuss some performance issues related to locks and some considerations, such as avoiding deadlocks.In order to reduce the competition for locks and cause program performance to degrade, the following suggestions can be used to improve performance.

1. Reduce lock holding time

For applications that use locks for concurrency control, the duration of a lock held by a single thread is directly related to system performance during lock competition. The longer a thread holds a lock, the more intense the competition for locks.The following code snippet is illustrated as an example:

public synchronized void syncMethod(){
     othercode1();
     mutexMethod();
     othercode2();
}

In the syncMethod() method, it is assumed that only the mutexMethod() method is required for synchronization, while the othercode1() method and othercode2() method do not require synchronization control.If these two methods are heavyweight, they will take a long time to CPU, and in high concurrency, synchronizing the entire method will result in a large increase in wait threads.Since a thread acquires an internal lock when it enters this method and releases the lock only after all tasks are executed, the optimization scheme is to synchronize only if necessary, which can significantly reduce the time the thread holds the lock and improve the throughput of the system.

public void syncMethod2(){
     othercode1();
     synchronized(this){
          mutexMethod();
     }
     othercode2();
}

2. Reduce lock granularity

Is to reduce the scope of the locked object, thereby reducing the possibility of lock conflicts, and thereby improving the concurrency of the system.A typical usage scenario in JDK is ConcurrentHashMap, which is subdivided internally into several small HashMaps called Segments (SEGMENT), which are 16 segments by default.

If you need to add a new table item to ConcurrentHashMap, instead of locking the entire HashMap, you first get the segment in which the table item should be stored based on hashcode, lock the segment, and complete the put() method operation.In a multithreaded environment, if multiple threads put () simultaneously, true parallelism can be achieved between threads as long as the added table items are not stored in the same segment.The default is 16 segments, which, fortunately, allows 16 threads to be plugged in at the same time, greatly improving throughput.The following code is its put() method operation.Lines 5-6 get the sequence number j of the corresponding segment from the key, get the segment s, and insert the data into the given segment.

public V put(K key, V value) {
     Segment<K,V> s;
     if (value == null)
         throw new NullPointerException();
     int hash = hash(key);
     int j = (hash >>> segmentShift) & segmentMask;
     if ((s = (Segment<K,V>)UNSAFE.getObject          // nonvolatile; recheck
          (segments, (j << SSHIFT) + SBASE)) == null) //  in ensureSegment
         s = ensureSegment(j);
     return s.put(key, hash, value, false);
}

However, reducing granularity presents a new problem and consumes a lot of resources when a system needs a global lock.For example, to get global information about ConcurrentHashMap, you need to acquire locks for all segments at the same time to successfully implement it.For example, the size() method of this map returns the sum of all valid table entries of ConcurrentHashMap. To get this information, locks on all subsegments are required, so the code for the size() method is as follows:

public int size() {
    // Try several times to get an accurate count.If you fail due to continuous asynchronous changes in the table, use locks.
    final Segment<K,V>[] segments = this.segments;
    int size;
    boolean overflow; 
    long sum;         // Total
    long last = 0L;   // Last Total
    int retries = -1; 
    try {
        for (;;) {
            if (retries++ == RETRIES_BEFORE_LOCK) {
                for (int j = 0; j < segments.length; ++j)
                    ensureSegment(j).lock(); // Lock all segments
            }
            sum = 0L;
            size = 0;
            overflow = false;
            for (int j = 0; j < segments.length; ++j) {
                Segment<K,V> seg = segmentAt(segments, j);
                if (seg != null) {
                    sum += seg.modCount; // Total Statistics
                    int c = seg.count;
                    if (c < 0 || (size += c) < 0)
                        overflow = true;
                }
            }
            if (sum == last)
                break;
            last = sum;
        }
    } finally {
        if (retries > RETRIES_BEFORE_LOCK) {
            for (int j = 0; j < segments.length; ++j)
                segmentAt(segments, j).unlock();   // Release all locks
        }
    }
    return overflow ? Integer.MAX_VALUE : size;
}

You can see from the code above that size() first tries unlocked summation, and if it fails, tries the method of locking.This method of reducing lock granularity can truly improve the throughput of the system only if method calls similar to size() to obtain global information are not frequently used.

3. Replace exclusive locks with read-write detachment locks

ReadWriteLock, a read-write detachment lock, can improve the performance of the system. In fact, it is a special case of granularity reduction. ReadWriteLock is a partition of system function points.Since the read operation itself does not affect the integrity and consistency of the data, it is theoretically possible to allow simultaneous reading between multiple threads, which is achieved by a read-write lock.Therefore, the use of read-write locks in situations with more reading and less writing can effectively improve the concurrency of the system.

4. Lock Separation

A further extension of the idea of read-write locks is the separation of locks.Read-write locks perform effective lock separation depending on the read-write operation functionality.We can use similar separation ideas based on the functional characteristics of the application, or we can separate exclusive locks.In the implementation of LinkedBlockingQueue, take () and put () implement the functions of retrieving and adding data from the queue respectively. Although both functions modify the queue, LinkedBlockingQueue is based on a chain table. They operate on one function and the chain head and the other on the end of the chain table, which theoretically do not conflict.With exclusive locks, these two methods cannot achieve true concurrency, and they wait for each other to release the lock resources.In JDK, take() and put() are separated by two locks.

/** Lock held by take, poll, etc */
private final ReentrantLock takeLock = new ReentrantLock();

/** Wait queue for waiting takes */
private final Condition notEmpty = takeLock.newCondition();

/** Lock held by put, offer, etc */
private final ReentrantLock putLock = new ReentrantLock();

/** Wait queue for waiting puts */
private final Condition notFull = putLock.newCondition();

The code snippet above defines takeLock and putLock, which are used in take () and put() methods respectively, so the two methods are independent of each other. There is no lock competition between the two methods. Only take () and take () methods, put() and put() methods need to compete for takeLock and putLock, respectively, to weaken them.Lock competition.The take() method is implemented as follows:

public E take() throws InterruptedException {
    E x;
    int c = -1;
    final AtomicInteger count = this.count;
    final ReentrantLock takeLock = this.takeLock;
    takeLock.lockInterruptibly();   // Cannot have two threads fetching data at the same time
    try {
        while (count.get() == 0) {  // If no data is currently available, wait
            notEmpty.await();       // Waiting for notification of put() method operation
        }
        x = dequeue();              // Get the first data
        c = count.getAndDecrement();// Number minus one, atomic operation, so put() accesses count at the same time
        if (c > 1)
            notEmpty.signal();      // Notify other take() method actions
    } finally {
        takeLock.unlock();          // Release lock
    }
    if (c == capacity)
        signalNotFull();            // Notify put() method action that there is free space
    return x;
}

The put() method is implemented as follows:

public void put(E e) throws InterruptedException {
    if (e == null) throw new NullPointerException();
    int c = -1;
    Node<E> node = new Node(e);
    final ReentrantLock putLock = this.putLock;
    final AtomicInteger count = this.count;
    putLock.lockInterruptibly();                 // You cannot have two threads putting () at the same time
    try {
        while (count.get() == capacity) {        // The queue is full
            notFull.await();                     // wait for
        }
        enqueue(node);                           // insert data
        c = count.getAndIncrement();             // Total number of updates, variable c is the value before count plus 1
        if (c + 1 < capacity)
            notFull.signal();                    // Enough space to notify other threads
    } finally {
        putLock.unlock();                        // Release lock
    }
    if (c == 0)
        signalNotEmpty();                        // After successful insertion, notify the take() method to fetch data
}

5. Lock coarsening

In general, to ensure effective concurrency among multiple threads, each thread is required to hold locks as short as possible and release them immediately after using the resources.Only in this way can other threads waiting on this lock get resources to perform their tasks as soon as possible, but if the same lock is requested, synchronized, and released without interruption, it will consume system resources itself, which is not conducive to performance optimization.For this reason, when a virtual machine encounters a series of consecutive requests and releases for the same lock, it integrates all the lock operations into one request for the lock, thereby reducing the number of requests for synchronization of the lock, which is called lock coarsening.

During the development process, lock coarsening can be performed on reasonable occasions, especially when a lock is requested within a loop, because each loop has an action to apply and release the lock, which is not necessary at all. Locking outside the loop is OK.

Topics: PHP JDK less