C | concurrent programming | implementation of mutual exclusion lock

Posted by PAFTprod on Mon, 22 Nov 2021 19:55:16 +0100

OS:Three Easy Pieces

Introduction

Lock lock, just like the lock in reality, determines the access to resources. In concurrent programming, because of resource sharing, the write operation in one thread may affect the read operation of another thread.

In order to eliminate this side effect, some strict programmers choose Functional Programming to ensure complete Thread Safety. In normal structured programming, programmers tend to use locks to prevent unexpected side effects.

The lock controls that a resource can only be accessed by one thread at the same time, so it effectively avoids the abnormal output caused by reading and writing in the case of multiple threads.

Spin Spinning lock

typedef struct __lock_t { int flag; } lock_t;

void init(lock_t *mutex) {
// 0 -> lock is available, 1 -> held
 mutex->flag = 0;
 }

 void lock(lock_t *mutex) {
 while (mutex->flag == 1) // TEST the flag   Line a
 ; // spin-wait (do nothing)
 mutex->flag = 1; // now SET it!             Line b
 }

 void unlock(lock_t *mutex) {
 mutex->flag = 0;

This is the most basic version. When the flag is set to 1, the lock is obtained, and when the flag is set to 0, the lock is released. When the lock is not released, the program will constantly detect the flag without doing anything, so it is called spin.

Atomicity

There is a fatal bug in this version, because CPU scheduling does not guarantee that the code of other threads will not be inserted between line a and Line b. If other threads have obtained the lock after line a, line ba will still be executed, that is, the detection and setting of flag are separated, resulting in two threads holding the lock at the same time.

We generally use atomic exchange to ensure that obtaining a lock will be an atomic operation, either complete flag detection and setting at the same time, or do nothing at all. Here, the C code form is as follows (TestAndSet):

int TestAndSet(int *old_ptr, int new) {
int old = *old_ptr; // fetch old value at old_ptr
*old_ptr = new; // store 'new' into old_ptr
return old;
}

typedef struct __lock_t {
int flag;
} lock_t;

 void init(lock_t *lock) {
 // 0: lock is available, 1: lock is held
 lock->flag = 0;
 }

 void lock(lock_t *lock) {
 while (TestAndSet(&lock->flag, 1) == 1)
; // spin-wait (do nothing)
}

 void unlock(lock_t *lock) {
 lock->flag = 0;
}

The other is (CompareAndSwap), which has similar functions, but is more widely used than TestAndSet

int CompareAndSwap(int *ptr, int expected, int new) {
 int original = *ptr;
 if (original == expected)
 *ptr = new;
 return original;
 }
void lock(lock_t *lock) {
 while (CompareAndSwap(&lock->flag, 0, 1) == 1)
 ; // spin
}

Hunger Starvation

Since CPU scheduling does not guarantee that those who try to obtain the lock first will obtain it first, a thread may not obtain the lock for a long time. A simple idea is to use the queue to ensure FIFO.

We first take FetchAndAdd as the atomic post + + operation.

int FetchAndAdd(int *ptr) {
 int old = *ptr;
 *ptr = old + 1;
 return old;
}

typedef struct __lock_t {
 int ticket;
 int turn;
 } lock_t;

void lock_init(lock_t *lock) {
 lock->ticket = 0;
 lock->turn = 0;
 }

 void lock(lock_t *lock) {
 int myturn = FetchAndAdd(&lock->ticket);
 while (lock->turn != myturn)
; // spin
 }

 void unlock(lock_t *lock) {
 lock->turn = lock->turn + 1;
 }

Just like the restaurant call, turn represents the current number, and ticket represents the number in hand. After each customer (thread) runs out, it calls the next number.

Sleeping lock

Because spin locks cause each thread to perform while operations, idling causes great waste. Therefore, an improvement idea is to make the thread sleep directly before obtaining the lock. When the lock is released, wake up the next thread. Similarly, we use queue as the data structure, but maintain an explicit linked list.

typedef struct __lock_t {
 int flag;
 int guard;
 queue_t *q;
 } lock_t;

 void lock_init(lock_t *m) {
 m->flag = 0;
 m->guard = 0;
 queue_init(m->q);
 }

lock_t:

The flag here indicates whether the lock is required by threads. The lock can be waited by multiple threads at the same time. It will be set to 0 only when there are no threads waiting.

Basically as a spin lock around the flag and queue manipulations the lock is using

void lock(lock_t *m) {
 while (TestAndSet(&m->guard, 1) == 1)
 ; //acquire guard lock by spinning
 if (m->flag == 0) {
 m->flag = 1; // lock is acquired
 m->guard = 0;
 } else {
 queue_add(m->q, gettid());
 m->guard = 0;
 park();
 }
 }

lock:

When the lock queue is empty: set flag to 1, that is, the flag lock is occupied

When the lock queue is not empty: join the queue and use the park operation to make the thread sleep and wait for wake-up.

void unlock(lock_t *m) {
while (TestAndSet(&m->guard, 1) == 1)
 ; //acquire guard lock by spinning
 if (queue_empty(m->q))
 m->flag = 0; // let go of lock; no one wants it
 else
 unpark(queue_remove(m->q)); // hold lock
// (for next thread!)
 m->guard = 0;

unlock:

When the lock queue is empty: set the flag to 0, that is, the lock is idle

When the queue in the lock is not empty: out of the queue, use the unpark operation to wake up the next thread and release the lock

Buggy

 queue_add(m->q, gettid());
 m->guard = 0;             Line a   
 park();                   Line b

If there happens to be a thread unlock ed between line a and line b, the thread currently locking will be awakened, and then line b will be run to make the current thread sleep, while the current thread in the queue has been out of the queue. In this way, the current thread in sleep can no longer be awakened.

In order to solve this problem, it would be good if ab could be made atomic directly, but it is difficult to do in practice.

We can specifically deal with the above problems. For example, in an implementation, the setpark function can make the program enter the state of preparing the park. If the process has been unpark before the park, the park will be returned directly.

queue_add(m->q, gettid());
setpark();
m->guard = 0;                
park();

Two phase lock

In the actual operating system, the implementation of Mutex combines the implementation of the above two locks. The following is the Mutex implementation mechanism of Linux.

This code!!!

 void mutex_lock (int *mutex) {
 int v;
 /* Bit 31 was clear, we got the mutex (the fastpath) */
//Spin lock!
 if (atomic_bit_test_set (mutex, 31) == 0)
 return;
//Maintain the waiting queue length!
 atomic_increment (mutex);
//There is a problem here. If the setpark part continue s later, there will be two awake threads grabbing the lock. If the newly awakened thread can't grab it
//The original team leader has to go back to the end of the team, which is very confusing. You may need a scheduling algorithm to ensure that the newly awakened thread can execute first?
 while (1) {
//Awakened by unlock!! acquire the lock and maintain the waiting queue length
 if (atomic_bit_test_set (mutex, 31) == 0) {
 atomic_decrement (mutex);
 return;
 }
 /* We have to waitFirst make sure the futex value
 we are monitoring is truly negative (locked). */

//Similar to setpark! Prevent v = *mutex; the front is inserted into unlock
v = *mutex;
 if (v >= 0)
 continue;

//Similar to setpark! Prevent v = *mutex from being inserted into unlock
 futex_wait (mutex, v);
 }
 }

 void mutex_unlock (int *mutex) {
 /* Adding 0x80000000 to counter results in 0 if and
 only if there are not other interested threads */
//Unlock. If the waiting queue length is 0, there is no need to wake up! This logic is not put into futex_wake to reduce the overhead of sys call.
 if (atomic_add_zero (mutex, 0x80000000))
 return;

/* There are other threads waiting for this mutex,
 wake one of them up. */
//Wake up the first thread!
 futex_wake (mutex);
 }

Two phase locks realize that for those locks that will be released immediately, using spin locks is more beneficial, while wake-up and other operations need to use more sys calls, which will increase the overhead.

futex_wait and futex_wake maintain a queue corresponding to mutex in kernel mode.

In the first stage, the thread will spin several times to try to acquire the lock.

Once the first phase is not completed, it will enter the second phase, and the thread will sleep until the lock is released.

The above linux implementation only spins once, but you can also use a cycle with a fixed number of spins.

be careful:

The reason for setpark here is different from the above, because there is no case of entering the queue first and then sleeping.

Special case: Queue Empty

If there is no judgment of V > = 0,

If C unlock is inserted in the middle of B lock, because the queue is empty, the lock bit becomes 0 and the next thread will not be waked up. At this time, B wait cannot be awakened.

But if continue, B can take the lock directly without wait ing.

association

I personally compare this to polling / interrupt,

Taking IO as an example, polling requires the CPU to continuously access IO, while interrupt is accessed by the CPU only when IO changes.

Similarly, Spin is actually the CPU constantly judging the lock, while Sleep wakes up the next thread only when the lock is released.

Despite the different levels of abstraction, this idea does have something in common.

In fact, the observer model is also similar. It is difficult to query whether other objects have changed, because they may change all the time without knowing it. However, it is easy for the changed party to notify the inquirer, just send messages at the same time.

Guess or sum up, perhaps this phenomenon is caused by information isolation. It is precisely because of the unequal information between the two sides of information exchange that the cost of transmitting this information is quite different.

Programmer Think