Interprocess communication
Interprocess communication is used to solve:
- One process passes messages to another process
- Ensure that two or more processes do not intersect in key activities. For example, two processes compete for a resource.
- Ensure the correct sequence.
Competitive conditions
When two or more processes read and write some shared data, the execution result depends on the execution order of the processes, which is called competitive condition.
Critical zone
A program segment that accesses shared memory is called a critical region. Ensuring that only one process among multiple processes can operate on shared memory is called mutual exclusion.
In order to handle concurrency well, the program should meet the following four conditions:
- No two processes can be in a critical region at the same time
- No assumptions are made about the speed and number of cups.
- Programs running outside the critical zone must not block other processes.
- The process must not wait indefinitely to enter the critical zone.
Busy waiting mutex
Mutual exclusion caused by cpu idling.
Mask interrupt
After each process enters the critical area, it immediately shields all interrupts. After shielding interrupts, the cpu cannot schedule, because the cpu will schedule only when the clock interrupt or other interrupts, and the shielding interrupt will also shield the clock interrupt.
characteristic:
- Shielding interrupts can only be effective on one cpu. On multi-core processors, shielding one cpu interrupt will have no impact on the other.
- It is useful for the kernel to mask interrupts during several instructions that it updates variables or lists.
Lock variable
Use a variable to represent the lock. Assuming that the lock variable is 1 to represent locking, a process will enter the critical area after the test lock is 0, and then set the lock variable to 1. After leaving the critical area, set the lock variable to 0.
This approach does not guarantee true mutual exclusion. Because multiple processes may enter the critical zone during the time period when the test lock variable is 0 but the lock variable is not set.
Strict Alternation
Continuous testing until a condition is positive is called busy waiting, and the lock using busy waiting is called spin lock. This method wastes cpu running time and can only be used when the waiting time is very short.
The pseudo code is as follows:
while(TRUE) { while(turn != 0) ; //Critical zone turn = 1; } while(TRUE) { while(turn != 1) ; //Critical zone turn = 0; }
Two processes enter the critical zone in turn. When one process is very slow, it is not a good method.
Peterson solution
Petenson algorithm
#define FALSE 0 #define TRUE 1 #define N 2 int turn; int interested[N] void enter_region(int process) { int other = 1 - process; turn = process; //TURN is used to ensure that only one can enter the critical area interested[process] = TRUE; // while( turn == process && interested[other] == TRUE ) ; } void leave_region(int process) { interested[process] = FALSE; }
TSL instruction
Hardware support is required, and the processor provides an instruction:
TSL RX,LOCK
Under the action of this instruction, RX is stored in LOCK, and LOCK is set to non-0. These actions are an atomic operation. This instruction is called test and set LOCK. TSL has no real judgment operation. It just ensures that reading and writing is an atomic operation. The CPU executing the TSL instruction will LOCK the memory bus to prevent other CPUs from accessing memory before the execution of this instruction.
Shielding interrupts does not mean locking the memory bus. Processor 1 shielding interrupts will have no impact on processor 2. Therefore, to prevent other CPU s from accessing memory, you need to lock the memory bus.
enter_region: TSL REGISTER,LOCK #Store LOCK in REGISTER and set LOCK to non-0 Atomic operation. CMP REGISTER,#0 #REGISTER is a non-zero value, indicating that there are processes in the critical area JNE enter_region #If REGISTER is not 0, jump to enter_region RET leave_region: #Leave the critical zone MOV LOCK,#0 #Set LOCK to 0 and other processes can enter RET
Multiple processes have their own registers, but they share a memory block LOCK. After one process sets LOCK to 1, other processes can see this result. The key to TSL synchronization is that read and write are atomic. The reason why LOCK variables cannot be synchronized before is that read and write are not atomic, resulting in inconsistent states when different processes read LOCK variables. Using read-write synchronization can solve this problem.
INTER x86 CUP uses XCHG instructions in low-level synchronization, and XCHG atomically exchanges the contents of two locations.
enter_region: MOVE REGISTER,#1 XCHG REGISTER,LOCK #Similar to TSL. REGISTER is a non-zero value, which means that there are processes in the critical area CMP REGISTER,#0 JNE enter_region RET leave_region: MOVE LOCK,#0 RET
Both instructions are stored in a memory value, and then set the memory value to non-0 and atomic operation.
Busy waiting for some problems
Basic idea of busy waiting: when you want to enter the critical area, check whether you can enter it first. If not, the process will wait in place until it can enter.
This method not only wastes cpu time, but also has the problem of priority inversion.
Suppose there are two processes L and H. L has a lower priority and h has a higher priority, and the scheduling rule stipulates that when h is ready, H should be scheduled. Consider the following scenario. After L enters the critical zone, h is ready and ready to enter the critical zone. H wants to enter the critical area, but the critical area has been occupied by L. due to the high priority of H, l cannot be scheduled to leave the critical area. Therefore, h will always be busy waiting to enter the critical area.
Semaphore
Semaphore is a special type, which can take 0 or positive value. It has two operations: down and up.
The down operation will subtract 1 from the value of the semaphore. If the value is greater than 0, it will directly subtract 1; If the value is equal to 0, the process will sleep, and the down operation is not completed. up increases the semaphore value by 1. If one or more processes sleep on the semaphore, one process will be awakened randomly to complete the down operation. After the down operation is completed, the semaphore value is still 0. Checking, modifying variable values, and sleeping are all atomic operations.
A semaphore with an initial value of 1 is called a binary semaphore. At this time, only one process can enter the critical area.
Producer and Consumer
#define N 100 typedef int semaphore; //Semaphore semaphore mutex = 1; //Control critical zone semaphoer empty = N; //Number of empty slots semaphoer full = 0; //Amount of slots with data int items = 0; void producer(void) { while(true) { down(empty); down(mutex); ++items; up(mutex); up(full); } } void consumer(void) { while(true) { down(full); down(mutex); --item; up(mutex); up(empty); } }
Semaphores have two functions:
- Synchronization. Used to coordinate the sequence of different operations.
- Mutually exclusive. The user guarantees that there is only one process reading and writing buffer and related variables at any time.
In the above example, full and empty are used to control the occurrence or non occurrence of events, while mutex is used to ensure that there is only one process in the critical area.
mutex
Mutex is a brief version of semaphores. It omits the counting ability of semaphores and has only two states: Unlock and lock. It is only used to manage shared resources or a small piece of code.
If the mutex is not locked, the process will get a lock if it locks it, and then the process can enter the critical area smoothly. Otherwise, the process will block. Unlocking releases the lock and randomly selects a blocked process to allow it to acquire the lock.
Thread packages can be implemented in user space using TSL and XCHG. Thread packages implemented using TSL are as follows:
mutex_lock: TSL REGISTER,LOCK CMP REGISTER,#0 JZ ok #Lock successfully obtained CALL thread_yield #Abandon the CPU and run another thread JMP mutex_lock ok: RET mutex_unlock: MOVE LOCK,#0 RET
In mutex_ In lock, when entering the critical area fails, the CPU will be abandoned and another thread will be run. And busy waiting enter_region In, after entering failure, it will continue to cycle until the clock is interrupted.
Note that threads implemented in user space do not have threads with long clock stop events. The result is that the thread that obtains the lock by busy waiting will cycle forever.
futex
If the waiting time is short, it is suitable to use spin lock. At this time, using mutex will make the kernel Overhead account for a large proportion. However, if the competition is fierce, it is not suitable to use spin lock, because it will waste a lot of CPU time. At this time, it is more suitable for mutual exclusion.
Futex, fast user space mutex, implements the basic lock, but avoids falling into the kernel. A futex consists of two parts: user library and kernel service.
The kernel service provides a waiting queue in which blocked processes will be stored. System calls are required to store processes in the queue or unblock them.
The user library provides two operations, reduce and verify, and increase and verify.
- Reduce and verify the process used to obtain the lock. If the lock acquisition fails, the process will be put into the waiting queue through the system call.
- Add and verify to release the lock. After releasing the lock, if a process is blocked in the waiting queue, it will notify the kernel to stop blocking one or more processes in the waiting queue.
futex checks whether the lock is held in user space, which has less kernel loss than the lock implemented in kernel space.
pthread
pthread uses a mutex to protect the critical area, and also provides condition variables for synchronization.
Mutex operation:
function call | explain |
---|---|
pthread_init | Initialize mutex |
pthread_destroy | Undo a mutex |
pthread_lock | Get a lock or block |
pthread_trylock | Acquire a lock or fail |
pthread_unlock | Release lock |
Conditional variable operation:
function call | explain |
---|---|
pthread_cond_init | Initialize condition variable |
pthread_cond_destroy | Undo condition variable |
pthread_cond_wait | Block to wait for condition variables |
pthread_cond_signal | Send a signal to another blocked thread |
pthread_cond_broadcast | Send a signal to all blocked threads |
The following code is an example of the combined use of semaphores and mutexes:
#define MAX 100000000 pthread_mutex_t mutex; pthread_cond_t cond; int buffer = 0; void *producer(void *args) { for(int i=0; i<MAX; ++i) { pthread_mutex_lock(&mutex); while(buffer != 0) //buffer is a condition { pthread_cond_wait(&cond, &mutex); //Mutex and semaphore sharing } buffer = i; pthread_cond_signal(&cond); pthread_mutex_unlock(&mutex); } pthread_exit(0); } void *consumer(void *args) { for(int i=0; i<MAX; ++i) { pthread_mutex_lock(&mutex); while(buffer == 0) { pthread_cond_wait(&cond, &mutex); } buffer = 0; pthread_cond_signal(&cond); pthread_mutex_unlock(&mutex); } pthread_exit(0); } int main(int argc, char **argv) { pthread_t pro, con; pthread_mutex_init(&mutex); pthread_cond_init(&cond); pthread_creat(&pro, NULL, producer, NULL); pthread_cread(&con, NULL, consumer, NULL ); pthread_join(pro, 0); pthread_join(con, 0); pthread_mutex_destroy(&mutex); pthread_cond_destroy(&cond); return 0; }
The condition variable is used to notify that the condition has changed, that's all. If there is no condition variable, because the condition is a global variable, you need to constantly add and release locks every time you judge the condition, which will increase the consumption of the kernel. With conditional variables, threads can block conditional variables to wait for conditions to come true without frequently locking and releasing locks. The use of conditions must be locked, so you need to know the mutex of the protection conditions when waiting for the condition variable, because it needs to release the lock to let other threads update the conditions.
Tube side
message passing
a barrier
The barrier is very important to realize the synchronous relationship between multiple processes. It ensures that only one of all processes does not reach the barrier, and all other processes will be blocked by the barrier.
Lock avoidance: read copy update RCU
RCU is a way of data synchronization. It is aimed at linked lists to improve the access efficiency of linked lists. When using Add unlock to access the linked list, it is inefficient. RCU allows multiple threads to read the linked list without locking, and then allows one thread to modify the linked list with locking.
Main problems to avoid lock:
- Multiple threads read, and one thread deletes the node. When a thread deletes a node, the original read thread may read the released space, causing the system to crash. RCU will wait for the read thread operation to complete before deleting the node. This period is called grace period.
- In the reading process, if a new node is inserted, when the thread reads this node, it needs to ensure the integrity of the node.
- Ensure the integrity of reading the linked list. Add or delete a node, and the original node can still read the following nodes continuously. However, RCU does not guarantee whether the delete / insert node can be read.
Grace period
void foo_read(void) { rcu_read_lock(); foo *fp = gbl_foo; if ( fp != NULL ) dosomething(fp->a,fp->b,fp->c); rcu_read_unlock(); } void foo_update( foo* new_fp ) { spin_lock(&foo_mutex); foo *old_fp = gbl_foo; gbl_foo = new_fp; spin_unlock(&foo_mutex); synchronize_rcu(); kfee(old_fp); }
rcu_read_lock() and rcu_unlock() marks the beginning and end of the RCU reading process. Its function is to judge whether the grace period is over. synchronize_rcu() enters a grace period and does not return until the end of the grace period.
Subscription publish mechanism
The compiler optimizes the instructions generated by the source code, which will change the order of instructions, resulting in the inconsistency of data used by each thread.
Here seems to be a memory barrier to solve the problem.
Integrity of data reading
Add node:
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-DP4QwTIG-1630470696876)(./picture / add node. png)]
When adding node X, first make the pointer of the added node point to node B behind the insertion position, and then change the pointer of node A before the insertion position.
Delete node:
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-9YS2oCUm-1630470696895)(./picture / delete node. png)]
When deleting a node, first change the pointer of the prefix, and then delete the node after waiting for the grace period to end.