GIL interpreter lock & process pool and thread pool

Posted by Illusionist on Wed, 19 Jan 2022 22:30:26 +0100

Today's content

GIL global interpreter lock (important theory)
- Verify the existence and function of GIL
Verify whether python multithreading is useful
Deadlock phenomenon
Process pool and thread pool (frequently used)
IO model

Detailed reference:
https://www.bilibili.com/video/BV1QE41147hU?p=500

Detailed content

1, GIL global interpreter lock

1. Introduction

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple 
native threads from executing Python bytecodes at once. This lock is necessary mainly 
because CPython's memory management is not thread-safe. (However, since the GIL 
exists, other features have grown to depend on the guarantees that it enforces.)

'''
1,python There are many versions of the interpreter, but the default is Cpython
	Cpython,Jpython,pypython

stay Cpython in GIL The global interpreter lock is also a mutex lock, which is mainly used to prevent multiple threads from being executed at the same time in the same process python Multithreading cannot use the multi-core advantage, and threads cannot be parallel)
GIL Must exist in Cpython Yes, because Cpython The interpreter's memory management is not linear safe.


2,Memory management is the garbage collection mechanism
	Reference count
	Clearly marked
	Generation recycling
'''

1. GIL is a feature of Cpython interpreter

2. python multiple threads in the same process cannot use the multi-core advantage (cannot be parallel but can be concurrent)

3. If multiple threads in the same process want to run, they must grab the GIL lock first

4. Almost all interpretative languages cannot run multiple threads in the same process at the same time

2. Verify the existence of GIL

If there is no IO operation for multiple threads in the same process, there will be no parallel effect due to the existence of GIL

However, if there are IO operations in the thread, the data will still be disordered. At this time, we need to add additional mutexes

# No IO
from threading import Thread
import time
m = 100
def test():
    global m
    tmp = m
    tmp -= 1
    m = tmp
for i in range(100):
    t = Thread(target=test)
    t.start()
time.sleep(3)
print(m)
# Result 0

# IO operation occurred: time sleep(1)
from threading import Thread
import time
m = 100
def test():
    global m
    tmp = m
    time.sleep(1)		< -- IO
    tmp -= 1
    m = tmp
for i in range(100):
    t = Thread(target=test)
    t.start()
time.sleep(3)
print(m)
# Results print in 4 seconds 99

2, Deadlock phenomenon

Mutexes cannot be used at will, otherwise they are prone to deadlock:

After thread 1 finishes executing fun1 function, it starts to execute func2 function and grabs the B lock,

But at this time, thread 2 also starts to execute func1 and grabs the A lock,

At this time, thread 1 cannot grab lock A and stop in place, and thread 2 cannot grab lock B and stop in place

from threading import Thread, Lock
import time

A = Lock()
B = Lock()


class MyThread(Thread):
    def run(self):
        self.func1()
        self.func2()

    def func1(self):
        A.acquire()
        print('%s Got it A lock' % self.name)  # current_thread().name get thread name
        B.acquire()
        print('%s Got it B lock' % self.name)
        time.sleep(1)
        B.release()
        print('%s Released B lock' % self.name)
        A.release()
        print('%s Released A lock' % self.name)

    def func2(self):
        B.acquire()
        print('%s Got it B lock' % self.name)
        A.acquire()
        print('%s Got it A lock' % self.name)
        A.release()
        print('%s Released A lock' % self.name)
        B.release()
        print('%s Released B lock' % self.name)

for i in range(10):
    obj = MyThread()
    obj.start()
 
"""Even if you know the characteristics and usage of the lock, don't use it easily, because it is easy to cause deadlock"""

3, Is python multithreading useless?

Is python multithreading useless?

(python's multi-process can take advantage of multi-core, and multi threads in the same process cannot take advantage of multi-core because of the existence of GIL)

This depends on the situation. It mainly depends on whether the code is IO intensive or computing intensive

IO intensive:

There are a large number of IO operations in the code. In case of IO operations, the CPU will switch to other threads to run according to multi-channel technology
Compute intensive:

There is no IO operation in the code, and the running speed is fast and the time is short. The CPU will not switch, so the multi-core advantage cannot be used

# Whether it is useful depends on the situation (type of program)
# IO intensive
	eg:Four tasks, each taking 10 minutes s
    	Opening multiple processes does not have much advantage	10s+
        	encounter IO You need to switch and set up the process. You also need to apply for memory space and copy code
        Multithreading has advantages
			No need to consume additional resources 10 s+
# Compute intensive
	eg:Four tasks	 Each task takes 10 minutes s
        Multi process can take advantage of multi-core	10s+
      	Setting up multithreading can not take advantage of multi-core 40 s+

"""
Multi process and multi thread
"""
"""IO Intensive"""
from multiprocessing import Process
from threading import Thread
import threading
import os,time
def work():
    time.sleep(2)


if __name__ == '__main__':
    l=[]
    print(os.cpu_count()) #This machine is 4-core
    start=time.time()
    for i in range(400):
        p=Process(target=work) #It takes more than 22.31s, and most of the time is spent on the creation process
        # p=Thread(target=work) #It takes more than 2.08s
        l.append(p)
        p.start()
    for p in l:
        p.join()
    stop=time.time()
    print('run time is %s' %(stop-start))

"""Compute intensive"""
(Using multiple threads in the same process will run longer)
from multiprocessing import Process
from threading import Thread
import os,time
def work():
    res=0
    for i in range(100000000):
        res*=i
if __name__ == '__main__':
    l=[]
    print(os.cpu_count())  # This machine is 6-core
    start=time.time()
    for i in range(6):
        # p=Process(target=work) #It takes more than 5.35s
        p=Thread(target=work) #It takes more than 23.37s
        l.append(p)
        p.start()
    for p in l:
        p.join()
    stop=time.time()
    print('run time is %s' %(stop-start))

4, Process pool and thread pool

The appearance of process pool and thread pool reduces the execution efficiency of code, but ensures the safety of computer hardware

reflection:Can I open processes or threads without restrictions???
    It must not be opened without restrictions
    	If only from the technical level, unlimited opening is certainly possible and the most efficient
        But from the hardware level, it is impossible to achieve(The development of hardware can never catch up with the development of software)
pool
	On the premise of ensuring that the computer hardware does not collapse, multi process and multi thread are set up
    	It reduces the running efficiency of the program, but ensures the safety of computer hardware
 Process pool and thread pool
	Process pool:Set up a fixed number of processes in advance, and then call these processes repeatedly to complete the work(No new business will be opened in the future)
    Thread pool:Set up a fixed number of threads in advance, and then call these threads repeatedly to complete the work(No new business will be opened in the future)

Create thread pool

Note: when starting multithreading and multiprocessing, if you encounter the operation of synchronous task submission, such as join and With resutl, you can start all processes (threads) first, and then call their synchronization operations one by one

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import time
import os


def run():
    print('Start operation')
    index = 10
    time.sleep(0.3)
    for i in range(10):
        index += i
    return 'The result is: %s' % index


if __name__ == '__main__':
    print('Local machine CPU quantity: %s' % os.cpu_count())
    # Start 10 threads, first submit all tasks to the thread pool (asynchronous submission), and then put the obtained operation objects into the list for subsequent calls result() method (synchronous submission)
    t_list = []
    # p = ProcessPoolExecutor() creates a process pool. The default number of processes is the number of CPU s of the machine
    t_pool = ThreadPoolExecutor()  # Create a thread pool. The default number of threads is the number of native CPU s multiplied by 5
    for i in range(10):
        # After creating the thread, you need to submit the function
        t = t_pool.submit(run)  # The asynchronous submit() method returns a thread object
        t_list.append(t)
    for t in t_list:
        ret = t.result()  # The task submitted synchronously needs to have a return value
        print(ret)


# Operation results
 Local machine CPU quantity: 4
 Start operation
 Start operation
 Start operation
 Start operation
 Start operation
 Start operation
 Start operation
 Start operation
 Start operation
 Start operation
 The result is: 55
 The result is: 55
 The result is: 55
 The result is: 55
 The result is: 55
 The result is: 55
 The result is: 55
 The result is: 55
 The result is: 55
 The result is: 55

Topics: Python

Programmer Think