python data structure and GIL and multiprocess

Posted by jsucupira on Mon, 26 Aug 2019 03:14:26 +0200

A data structure and GIL

1 queue

Standard library queue module, provides queue, LIFO queue, priority queue for FIFO
The Queue class is thread-safe, suitable for secure exchange of data between multiple threads, using Lock and Condition internally

The reason why the size of the container is inaccurate is that it is impossible to get the exact size without locking it, because you have just read a size and have not yet been removed, it may have been modified by other threads. Although the size of the queue class has been locked, it is still not guaranteed that get immediately will succeed because putRead size and get,put method is separate.

2 GIL

1 Introduction

Global Interpreter Lock, Process Level Lock GIL
Cpython has a lock in the interpreter process called the GIL global interpreter lock.

GIL guarantees that only one thread executes code at a time in a Cpython process, even in multi-core situations.

2 IO and CPU intensive

In Cpython
IO intensive, because threads are blocked, other threads are scheduled
CPU-intensive. Current threads may continuously acquire GIL s, making it virtually impossible for other threads to use the CPU. To wake up other threads, data needs to be prepared at a high cost.

IO-intensive, multi-threaded solution, CPU-intensive, multi-process solution, bypass GIL.

The vast majority of read and write operations for built-in data structures in python are atomic

Because of GIL s, python's built-in data types become secure when programming with multiple threads, but in fact they are not thread-safe themselves

3 Reason for retaining GIL

Guido adheres to the simple philosophy of using python without requiring advanced system knowledge for beginners.
Remove the GIL instead.This reduces the efficiency of Cpython single-threaded execution.

4 Verify that it is single-threaded

Related Instances

import  logging
import datetime
logging.basicConfig(level=logging.INFO,format="%(asctime)s  %(threadName)s %(message)s ")
start=datetime.datetime.now()

def calc():
    sum=0
    for _ in range(1000000000):
        sum+=1

calc()
calc()
calc()
calc()
calc()
delta=(datetime.datetime.now()-start).total_seconds()
logging.info(delta)

Calculation results in multithreaded mode

import  logging
import datetime
import threading
logging.basicConfig(level=logging.INFO,format="%(asctime)s  %(threadName)s %(message)s ")
start=datetime.datetime.now()

def calc():
    sum=0
    for _ in range(1000000000):
        sum+=1
lst=[]
for _ in range(5):
    t=threading.Thread(target=calc)
    t.start()
    lst.append(t)

for t in lst:
    t.join()

delta=(datetime.datetime.now()-start).total_seconds()

print (delta)

give the result as follows

From these two programs, multithreading in Cpython has no advantage at all and takes as long as a thread because of the GIL

Two-process

1 Concept

1 Multiprocess description

Because of GIL s in python, multithreading is not the best choice for CPU-intensive programs

Multiprocess can run programs in a completely separate process, making full use of multiprocessors

However, isolation of the process itself causes data not to be shared, and threads are much lighter than processes.

Multiprocessing is also a means of resolving concurrency

2 Process and Thread Differences

Same:

Processes can be terminated, threads cannot be terminated by command. Thread termination either throws an exception or the program itself executes.

Inter-process synchronization provides the same classes as thread synchronization and works in the same way, but it is more expensive than thread synchronization and the underlying implementation is different.

Multprocessing also provides shared memory, server processes to share data, queue queues, and matching pipelines for interprocess communication

Difference

Different modes of communication
1 Multiprocess means enabling multiple interpreter processes. Interprocess communication must be serialized and deserialized
2 Security issues with data

Multi-process is best performed in main
Multithreaded has already processed the data and does not need to be serialized again

Multiprocess delivery must be serialized and deserialized.

3 Process Application

Remote Call, RPC, Cross Network

2 Parameter introduction

process class in multiprocessing

The process class follows the API of the Thread class, reducing learning difficulty
Different processes can be fully scheduled to execute on different CPU s

Multi-threading is best for IO intensive
Multi-process is best for CPU intensive

Related properties provided by the process

Name Meaning
pid Process ID
exitcode Status code for process exit
terminate() Terminate the specified process

3 Instances

import  logging
import datetime
import multiprocessing
logging.basicConfig(level=logging.INFO,format="%(asctime)s  %(threadName)s %(message)s ")
start=datetime.datetime.now()

def calc(i):
    sum=0
    for _ in range(1000000000):
        sum+=1
lst=[]
for  i in range(5):
    p=multiprocessing.Process(target=calc,args=(i,),name="P-{}".format(i))
    p.start()
    lst.append(p)
for p in  lst:
    p.join()

delta=(datetime.datetime.now()-start).total_seconds()
print (delta)

give the result as follows

Multiprocess itself avoids the time required for scheduling between processes. Multicore is used. There is CPU scheduling problem here
Multiprocess CPU promotion is obvious.
Single-threaded, multi-threaded running for a long time, and multi-process only takes half and a half, is true parallelism

4 process pool correlation

import  logging
import datetime
import multiprocessing
logging.basicConfig(level=logging.INFO,format="%(asctime)s  %(threadName)s %(message)s ")
start=datetime.datetime.now()

def calc(i):
    sum=0
    for _ in range(1000000000):
        sum+=1
    print (i,sum)
if  __name__=='__main__':
    start=datetime.datetime.now()
    p=multiprocessing.Pool(5)  # This is used to initialize the process pool, where resources are reusable
    for i in range(5):
        p.apply_async(calc,args=(i,))
    p.close()  # To perform a join below, you must close above
    p.join()
    delta=(datetime.datetime.now()-start).total_seconds()
    print (delta)

give the result as follows

Many processes are created, using process pools is a better way to handle them

5 Multi-process and multi-threaded selection

1 Choice

1 CPU intensive
Cpython uses GIL s, competes with each other when multithreaded, and the multicore advantage is not available. Ppython uses multiprocesses more efficiently

2 IO intensive

Suitable for multi-threading, reduces IO serialization overhead, and when IO waits, switches to other threads to continue execution, which is efficient, although multi-process is also suitable for IO-intensive

2 Applications

Request/Response Models: Common Processing Models in WEB Applications

master starts multiple worker worker processes, typically with the same number of CPU s
A worker starts multiple threads in a worker process to improve concurrent processing. A worker processes user requests and often needs to wait for data
This is how nginx works

Work processes generally have the same number of CPU cores, CPU affinity, and process migration costs in the CPU are higher.

Three concurrent packages

1 Concept

concurrent.futures
Modules introduced in version 3.2
Asynchronous Parallel Task Programming Module, provides an advanced asynchronous executable convenient interface

Two pool executors are provided

Executor of ThreadPoolExecutor asynchronous call thread pool
ProcessPoolExecutor asynchronously calls Executor of the process pool

2 Parameter Details

Method Meaning
ThreadPoolExecutor(max_workers=1) Create a pool of up to max_workers threads in the pool to execute asynchronously at the same time, returning an Executor instance
submit(fn,*args,**kwagrs) Submit the executed functions and parameters and return a Future instance
shutdown(wait=True) Clean Pool

Future class

Method Meaning
result() You can view the return of the call
done() Return to True if the call was successfully cancelled or the execution completed
cancelled() Return True if the call is successfully cancelled
running() Return True if running and cannot be cancelled
cancel() Attempts to cancel the call, returns False if it has been executed and cannot be canceled, or returns True otherwise
result(timeout=None) Get the returned result, timeout is None, wait for return, timeout set to expire, throw concurrent.futures.TimeoutError exception
execption(timeout=None) Get the returned exception, timeout is None, wait for return, timeout set to expire, throw concurrent.futures.TimeoutError exception

3 Thread Pool Related Instances

import  logging
import threading
from   concurrent  import  futures
import logging
import  time

logging.basicConfig(level=logging.INFO,format="%(asctime)-15s\t [%(processName)s:%(threadName)s,%(process)d:%(thread)8d] %(message)s")

def worker(n):  # Define tasks to be performed in the future
    logging.info("begin to work{}".format(n))
    time.sleep(5)
    logging.info("finished{}".format(n))
# Create a thread pool with a capacity of 3
executor=futures.ThreadPoolExecutor(max_workers=3)

fs=[]
for i in range(3):
    f=executor.submit(worker,i)  # Pass in parameter, return Future object
    fs.append(f)

for  i in range(3,6):
    f=executor.submit(worker,i)  # Pass in parameter, return Future object
    fs.append(f)
while True:
    time.sleep(2)
    logging.info(threading.enumerate())  #Return to the list of surviving threads
    flag=True
    for  f  in fs:
        logging.info(f.done()) # Return to True here if called or cancelled successfully
        flag=flag  and f.done()  # Returns True if all calls succeed or False if none
    if flag:
        executor.shutdown()  # If all calls succeed, the pool needs to be cleaned up
        logging.info(threading.enumerate())
        break

give the result as follows

The threads in its thread pool are used continuously. Once a thread is created, it does not change. The only bad thing is that the thread name does not change, but it most affects the printing result.

4 Process pool related instances

import  logging
import threading
from   concurrent  import  futures
import logging
import  time

logging.basicConfig(level=logging.INFO,format="%(asctime)-15s\t [%(processName)s:%(threadName)s,%(process)d:%(thread)8d] %(message)s")

def worker(n):  # Define tasks to be performed in the future
    logging.info("begin to work{}".format(n))
    time.sleep(5)
    logging.info("finished{}".format(n))
# Create a process pool with a capacity of 3
executor=futures.ProcessPoolExecutor(max_workers=3)

fs=[]
for i in range(3):
    f=executor.submit(worker,i)  # Pass in parameter, return Future object
    fs.append(f)

for  i in range(3,6):
    f=executor.submit(worker,i)  # Pass in parameter, return Future object
    fs.append(f)
while True:
    time.sleep(2)
    flag=True
    for  f  in fs:
        logging.info(f.done()) # Return to True here if called or cancelled successfully
        flag=flag  and f.done()  # Returns True if all calls succeed or False if none
    if flag:
        executor.shutdown()  # If all calls succeed, the pool needs to be cleaned up
        break

give the result as follows

5 Support context management

concurrent.futures.ProcessPoolExecutor inherits from concurrent.futures.base.Executor, while the parent class has the enter,_exit method, which supports context management and can use the with statement

import  logging
import threading
from   concurrent  import  futures
import logging
import  time

logging.basicConfig(level=logging.INFO,format="%(asctime)-15s\t [%(processName)s:%(threadName)s,%(process)d:%(thread)8d] %(message)s")

def worker(n):  # Define tasks to be performed in the future
    logging.info("begin to work{}".format(n))
    time.sleep(5)
    logging.info("finished{}".format(n))
fs=[]
with   futures.ProcessPoolExecutor(max_workers=3) as executor:
    for  i in range(6):
        futures=executor.submit(worker,i)
        fs.append(futures)
while True:
    time.sleep(2)
    flag=True
    for  f  in fs:
        logging.info(f.done()) # Return to True here if called or cancelled successfully
        flag=flag  and f.done()  # Returns True if all calls succeed or False if none
    if flag:
        executor.shutdown()  # If all calls succeed, the pool needs to be cleaned up
        break

give the result as follows

6 Summary

Unified thread pool, process pool invocation, simplified programming, is a reflection of python's simple philosophy of thought
Unique disadvantage: Unable to set thread name

Topics: Python Programming network Nginx