Overview of interprocess communication

Posted by KnottyAlder on Thu, 16 Dec 2021 10:17:19 +0100

Interprocess communication

Interprocess communication is used to solve:

One process passes messages to another process
Ensure that two or more processes do not intersect in key activities. For example, two processes compete for a resource.
Ensure the correct sequence.

Competitive conditions

When two or more processes read and write some shared data, the execution result depends on the execution order of the processes, which is called competitive condition.

Critical zone

A program segment that accesses shared memory is called a critical region. Ensuring that only one process among multiple processes can operate on shared memory is called mutual exclusion.
In order to handle concurrency well, the program should meet the following four conditions:

No two processes can be in a critical region at the same time
No assumptions are made about the speed and number of cups.
Programs running outside the critical zone must not block other processes.
The process must not wait indefinitely to enter the critical zone.

Busy waiting mutex

Mutual exclusion caused by cpu idling.

Mask interrupt

After each process enters the critical area, it immediately shields all interrupts. After shielding interrupts, the cpu cannot schedule, because the cpu will schedule only when the clock interrupt or other interrupts, and the shielding interrupt will also shield the clock interrupt.
characteristic:

Shielding interrupts can only be effective on one cpu. On multi-core processors, shielding one cpu interrupt will have no impact on the other.
It is useful for the kernel to mask interrupts during several instructions that it updates variables or lists.

Lock variable

Use a variable to represent the lock. Assuming that the lock variable is 1 to represent locking, a process will enter the critical area after the test lock is 0, and then set the lock variable to 1. After leaving the critical area, set the lock variable to 0.
This approach does not guarantee true mutual exclusion. Because multiple processes may enter the critical zone during the time period when the test lock variable is 0 but the lock variable is not set.

Strict Alternation

Continuous testing until a condition is positive is called busy waiting, and the lock using busy waiting is called spin lock. This method wastes cpu running time and can only be used when the waiting time is very short.
The pseudo code is as follows:

while(TRUE)
{
	while(turn != 0) 
		;
	//Critical zone
	turn = 1;
}
while(TRUE)
{
	while(turn != 1)
		;
	//Critical zone
	turn = 0;
}

Two processes enter the critical zone in turn. When one process is very slow, it is not a good method.

Peterson solution

Petenson algorithm

#define FALSE	0 
#define TRUE	1
#define N 		2
int turn;
int interested[N]
void enter_region(int process)
{
	int other = 1 - process;
	turn = process;					//TURN is used to ensure that only one can enter the critical area
	interested[process] = TRUE;		//
    while( turn == process && interested[other] == TRUE )
    	;
}
void leave_region(int process)
{
	interested[process] = FALSE;
}

TSL instruction

Hardware support is required, and the processor provides an instruction:

	TSL RX,LOCK

Under the action of this instruction, RX is stored in LOCK, and LOCK is set to non-0. These actions are an atomic operation. This instruction is called test and set LOCK. TSL has no real judgment operation. It just ensures that reading and writing is an atomic operation. The CPU executing the TSL instruction will LOCK the memory bus to prevent other CPUs from accessing memory before the execution of this instruction.

Shielding interrupts does not mean locking the memory bus. Processor 1 shielding interrupts will have no impact on processor 2. Therefore, to prevent other CPU s from accessing memory, you need to lock the memory bus.

enter_region:
	TSL REGISTER,LOCK				#Store LOCK in REGISTER and set LOCK to non-0 Atomic operation.
	CMP REGISTER,#0					#REGISTER is a non-zero value, indicating that there are processes in the critical area
	JNE enter_region				#If REGISTER is not 0, jump to enter_region
	RET
leave_region:						#Leave the critical zone
	MOV LOCK,#0						#Set LOCK to 0 and other processes can enter
	RET

Multiple processes have their own registers, but they share a memory block LOCK. After one process sets LOCK to 1, other processes can see this result. The key to TSL synchronization is that read and write are atomic. The reason why LOCK variables cannot be synchronized before is that read and write are not atomic, resulting in inconsistent states when different processes read LOCK variables. Using read-write synchronization can solve this problem.
INTER x86 CUP uses XCHG instructions in low-level synchronization, and XCHG atomically exchanges the contents of two locations.

enter_region:
	MOVE	REGISTER,#1	
	XCHG 	REGISTER,LOCK			#Similar to TSL. REGISTER is a non-zero value, which means that there are processes in the critical area
	CMP		REGISTER,#0
	JNE		enter_region
	RET
leave_region:
	MOVE	LOCK,#0
	RET

Both instructions are stored in a memory value, and then set the memory value to non-0 and atomic operation.

Busy waiting for some problems

Basic idea of busy waiting: when you want to enter the critical area, check whether you can enter it first. If not, the process will wait in place until it can enter.
This method not only wastes cpu time, but also has the problem of priority inversion.
Suppose there are two processes L and H. L has a lower priority and h has a higher priority, and the scheduling rule stipulates that when h is ready, H should be scheduled. Consider the following scenario. After L enters the critical zone, h is ready and ready to enter the critical zone. H wants to enter the critical area, but the critical area has been occupied by L. due to the high priority of H, l cannot be scheduled to leave the critical area. Therefore, h will always be busy waiting to enter the critical area.

Semaphore

Semaphore is a special type, which can take 0 or positive value. It has two operations: down and up.
The down operation will subtract 1 from the value of the semaphore. If the value is greater than 0, it will directly subtract 1; If the value is equal to 0, the process will sleep, and the down operation is not completed. up increases the semaphore value by 1. If one or more processes sleep on the semaphore, one process will be awakened randomly to complete the down operation. After the down operation is completed, the semaphore value is still 0. Checking, modifying variable values, and sleeping are all atomic operations.
A semaphore with an initial value of 1 is called a binary semaphore. At this time, only one process can enter the critical area.
Producer and Consumer

#define N 100
typedef int semaphore;		//Semaphore
semaphore mutex = 1;		//Control critical zone
semaphoer empty = N;		//Number of empty slots
semaphoer full  = 0;		//Amount of slots with data
int items       = 0;
void producer(void)
{
	while(true)
	{
		down(empty);
		down(mutex);
		++items;
		up(mutex);
		up(full);
	}
}
void consumer(void)
{
	while(true)
	{
		down(full);
		down(mutex);
		--item;
		up(mutex);
		up(empty);
	}
}

Semaphores have two functions:

Synchronization. Used to coordinate the sequence of different operations.
Mutually exclusive. The user guarantees that there is only one process reading and writing buffer and related variables at any time.

In the above example, full and empty are used to control the occurrence or non occurrence of events, while mutex is used to ensure that there is only one process in the critical area.

mutex

Mutex is a brief version of semaphores. It omits the counting ability of semaphores and has only two states: Unlock and lock. It is only used to manage shared resources or a small piece of code.
If the mutex is not locked, the process will get a lock if it locks it, and then the process can enter the critical area smoothly. Otherwise, the process will block. Unlocking releases the lock and randomly selects a blocked process to allow it to acquire the lock.
Thread packages can be implemented in user space using TSL and XCHG. Thread packages implemented using TSL are as follows:

mutex_lock:
	TSL		REGISTER,LOCK
	CMP		REGISTER,#0
	JZ		ok							#Lock successfully obtained
	CALL	thread_yield				#Abandon the CPU and run another thread
	JMP		mutex_lock
ok:	RET
	
mutex_unlock:
	MOVE	LOCK,#0
	RET

In mutex_ In lock, when entering the critical area fails, the CPU will be abandoned and another thread will be run. And busy waiting enter_region In, after entering failure, it will continue to cycle until the clock is interrupted.
Note that threads implemented in user space do not have threads with long clock stop events. The result is that the thread that obtains the lock by busy waiting will cycle forever.

futex

If the waiting time is short, it is suitable to use spin lock. At this time, using mutex will make the kernel Overhead account for a large proportion. However, if the competition is fierce, it is not suitable to use spin lock, because it will waste a lot of CPU time. At this time, it is more suitable for mutual exclusion.
Futex, fast user space mutex, implements the basic lock, but avoids falling into the kernel. A futex consists of two parts: user library and kernel service.
The kernel service provides a waiting queue in which blocked processes will be stored. System calls are required to store processes in the queue or unblock them.
The user library provides two operations, reduce and verify, and increase and verify.

Reduce and verify the process used to obtain the lock. If the lock acquisition fails, the process will be put into the waiting queue through the system call.
Add and verify to release the lock. After releasing the lock, if a process is blocked in the waiting queue, it will notify the kernel to stop blocking one or more processes in the waiting queue.

futex checks whether the lock is held in user space, which has less kernel loss than the lock implemented in kernel space.

pthread

pthread uses a mutex to protect the critical area, and also provides condition variables for synchronization.
Mutex operation:

function call	explain
pthread_init	Initialize mutex
pthread_destroy	Undo a mutex
pthread_lock	Get a lock or block
pthread_trylock	Acquire a lock or fail
pthread_unlock	Release lock

Conditional variable operation:

function call	explain
pthread_cond_init	Initialize condition variable
pthread_cond_destroy	Undo condition variable
pthread_cond_wait	Block to wait for condition variables
pthread_cond_signal	Send a signal to another blocked thread
pthread_cond_broadcast	Send a signal to all blocked threads

The following code is an example of the combined use of semaphores and mutexes:

#define MAX 100000000
pthread_mutex_t mutex;
pthread_cond_t	cond;
int buffer = 0;
void *producer(void *args)
{
	for(int i=0; i<MAX; ++i)
	{
		pthread_mutex_lock(&mutex);
		while(buffer != 0)						//buffer is a condition
		{
			pthread_cond_wait(&cond, &mutex);	//Mutex and semaphore sharing
		}
		buffer = i;
		pthread_cond_signal(&cond);
		pthread_mutex_unlock(&mutex);
	}
	pthread_exit(0);
}
void *consumer(void *args)
{
	for(int i=0; i<MAX; ++i)
	{
		pthread_mutex_lock(&mutex);
		while(buffer == 0)
		{
			pthread_cond_wait(&cond, &mutex);
		}
		buffer = 0;
		pthread_cond_signal(&cond);
		pthread_mutex_unlock(&mutex);
	}
	pthread_exit(0);
}
int main(int argc, char **argv)
{
	pthread_t pro, con;
	pthread_mutex_init(&mutex);
	pthread_cond_init(&cond);
	
	pthread_creat(&pro, NULL, producer, NULL);
	pthread_cread(&con, NULL, consumer, NULL );
	
	pthread_join(pro, 0);
	pthread_join(con, 0);
	
	pthread_mutex_destroy(&mutex);
	pthread_cond_destroy(&cond);
	return 0;
}

The condition variable is used to notify that the condition has changed, that's all. If there is no condition variable, because the condition is a global variable, you need to constantly add and release locks every time you judge the condition, which will increase the consumption of the kernel. With conditional variables, threads can block conditional variables to wait for conditions to come true without frequently locking and releasing locks. The use of conditions must be locked, so you need to know the mutex of the protection conditions when waiting for the condition variable, because it needs to release the lock to let other threads update the conditions.

Tube side

message passing

a barrier

The barrier is very important to realize the synchronous relationship between multiple processes. It ensures that only one of all processes does not reach the barrier, and all other processes will be blocked by the barrier.

Lock avoidance: read copy update RCU

RCU is a way of data synchronization. It is aimed at linked lists to improve the access efficiency of linked lists. When using Add unlock to access the linked list, it is inefficient. RCU allows multiple threads to read the linked list without locking, and then allows one thread to modify the linked list with locking.
Main problems to avoid lock:

Multiple threads read, and one thread deletes the node. When a thread deletes a node, the original read thread may read the released space, causing the system to crash. RCU will wait for the read thread operation to complete before deleting the node. This period is called grace period.
In the reading process, if a new node is inserted, when the thread reads this node, it needs to ensure the integrity of the node.
Ensure the integrity of reading the linked list. Add or delete a node, and the original node can still read the following nodes continuously. However, RCU does not guarantee whether the delete / insert node can be read.
Grace period

void foo_read(void)
{
	rcu_read_lock();			
	foo *fp = gbl_foo;
	if ( fp != NULL )
			dosomething(fp->a,fp->b,fp->c);
	rcu_read_unlock();
}
 
void foo_update( foo* new_fp )
{
	spin_lock(&foo_mutex);
	foo *old_fp = gbl_foo;
	gbl_foo = new_fp;
	spin_unlock(&foo_mutex);
	synchronize_rcu();
	kfee(old_fp);
}

rcu_read_lock() and rcu_unlock() marks the beginning and end of the RCU reading process. Its function is to judge whether the grace period is over. synchronize_rcu() enters a grace period and does not return until the end of the grace period.
Subscription publish mechanism
The compiler optimizes the instructions generated by the source code, which will change the order of instructions, resulting in the inconsistency of data used by each thread.
Here seems to be a memory barrier to solve the problem.
Integrity of data reading
Add node:
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-DP4QwTIG-1630470696876)(./picture / add node. png)]
When adding node X, first make the pointer of the added node point to node B behind the insertion position, and then change the pointer of node A before the insertion position.
Delete node:
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-9YS2oCUm-1630470696895)(./picture / delete node. png)]
When deleting a node, first change the pointer of the prefix, and then delete the node after waiting for the grace period to end.

Topics: Linux

Programmer Think