Concurrent programming for python learning

Posted by ElectricRain on Tue, 27 Aug 2019 14:41:08 +0200

Catalog

I. Multiple Processes of Concurrent Programming
2. Multithreading of Concurrent Programming

I. Multiple Processes of Concurrent Programming

1. Introduction of multiprocessing module

Multithreading in Python can't take advantage of multicore. If you want to make full use of the resources of multicore CPU (os.cpu_count() to view), in most cases in python, you need to use multiprocesses. Python provides multiprocessing.
multiprocessing module is used to start sub-processes and perform customized tasks (such as functions) in sub-processes, which are similar to threading's programming interface.

multiprocessing module has many functions: supporting sub-processes, communicating and sharing data, performing different forms of synchronization, providing Process, Queue, Pipe, Lock and other components.

Unlike threads, processes do not have any shared state, and processes modify data only within the process.

2. Introduction of Process Class

Create process classes

Process([group [, target [, name [, args [, kwargs]]]]]), an object instantiated by this class, represents a task in a subprocess (not yet started)

Emphasize:
1. You need to use keywords to specify parameters
 2. The position parameter assigned by args to the target function is a tuple and must have a comma.

Introduction of parameters

The group parameter is not used and the value is always None
 target represents the calling object, the task to be performed by the child process
 Args denotes the location parameter tuple of the calling object, args=(1,2,'egon',)
Kwargs denotes the dictionary of the calling object, kwargs={'name':'egon','age':18}
Name is the name of the child process

Method Introduction

p.start(): Start the process and call p.run() in the subprocess. 
p.run(): The way a process runs at startup is when it calls the function specified by target. This method must be implemented in the class of our custom class.  

p.terminate(): Force the termination of process p without any cleanup. If p creates a subprocess, the subprocess becomes a botnet process, which requires special care. If p also saves a lock, it will not be released, leading to a deadlock.
p.is_alive(): If p is still running, return to True

p.join([timeout]): The main thread waits for p to terminate (emphasizing that the main thread is in the state of waiting, while p is in the state of running). Timout is an optional timeout. It should be emphasized that p. join can only live in processes opened by start, not processes opened by run.

Introduction to attributes

p.daemon: The default value is False. If it is set to True, it represents the daemon running in the background. When the parent process of P terminates, P terminates, and when it is set to True, P cannot create its own new process. It must be set before p.start().

p.name: The name of the process

p.pid: PID of process

p.exitcode: The process is None at runtime, and if it is'- N', it is terminated by signal N.

p.authkey: Process authentication key, default is a 32-character string randomly generated by os.urandom(). The purpose of this key is to provide security for underlying inter-process communications involving network connections, which can only succeed if they have the same authentication key.

3. Use of Process Class

Note: In windows, Process() must be placed under # if name =='main':

Since Windows does not have fork, the multiprocessing module starts a new Python process and imports the calling module. 
If Process () is invoked at import, it will start an infinitely inherited new process (or until the machine runs out of resources). 
This is the original hidden internal call to Process (), using if__name_=="_main_", the statement in this if statement will not be invoked when imported.

3.1 Two ways to create sub-processes

Mode I

from   multiprocessing import Process
import time

def task(name):
    print(f"{name} is running")
    time.sleep(2)
    print(f"{name} is gone")

if __name__ == '__main__':
#In windos, the opening process must be under _name_=='_main_'.
    p = Process(target=task,args=("zbb",)) #Create a process object
    p.start()
#Just send a signal to the operating system to start a subprocess, and then execute the next line
# When the signal operating system receives it, it opens up a sub-process space from memory.
# Then all data copy of the main process is loaded into the sub-process, and then the cpu is called to execute.
# Opening up subprocesses is expensive.

    print("==Main Start")
    time.sleep(3)
    print("Main End")

# == Main Start
# zbb is running
# zbb is gone
# Main End

Mode 2 (Understanding is not often used)

from multiprocessing  import Process
import time

class MyProcess(Process):

    def __init__(self,name):
        super().__init__()
        self.name = name
    def run(self): #Must be run or the parent class will be executed, but the parent class is None
        print(f"{self.name} is running")
        time.sleep(2)
        print(f"{self.name} is gone ")

if __name__ == '__main__':
    p = MyProcess("zbb")
    p.start()
    print("==main")    
# = = main
# zbb is running
# zbb is gone

3.2 Getting process pid

pid is the unique identifier of a process in memory

For example, kill pid in linux

Code acquisition

from multiprocessing import Process
import os

def task(name):
    print(f'Subprocesses:{os.getpid()}')
    print(f'Main process:{os.getppid()}')



if __name__ == '__main__':
    p = Process(target=task,args=('zbb',))  # Create a process object
    p.start()
    print(f'====main{os.getpid()}')
    
# ==== Lord 13548
# Subprocesses: 1832
# Main process: 13548

win command line to get pid

Getting in linux

3.3 Spatial isolation between validation processes

Subprocesses and main processes are in different spaces

from multiprocessing import Process
import time
name = 'Pursue dreams NAN'

def task():
    global name
    name = 'zbb'
    print(f'Subprocesses{name}')


if __name__ == '__main__':
    p = Process(target=task)  # Create a process object
    p.start()
    # print('== main start')
    time.sleep(3)
    print(f'main:{name}')

# Subprocess zbb
# Lord: Dreaming NAN

3.4 join Method for Process Objects

join lets the main process wait for the child process to finish and execute the main process.

from multiprocessing import Process
import time

def task(name):
    print(f'{name} is running')
    time.sleep(2)
    print(f'{name} is gone')



if __name__ == '__main__':

    p = Process(target=task,args=('zbb',))  # Create a process object
    p.start()
    p.join() #
    print('==Main Start')

Multiple subprocesses use join

Verification 1

from multiprocessing import Process
import time


def task(name,sec):
    print(f'{name}is running')
    time.sleep(sec)
    print(f'{name} is gone')


if __name__ == '__main__':
    start_time = time.time()
    p1 = Process(target=task,args=('1',1))
    p2 = Process(target=task,args=('2',2))
    p3 = Process(target=task,args=('3',3))
    p1.start()
    p2.start()
    p3.start()
    p1.join()  # Join is only for the main process, and if the join is joined several times below, it is not blocked.
    p2.join()
    p3.join()
    print(f'==main{time.time()-start_time}')
# 1is running
# 2is running
# 3is running
# 1 is gone
# 2 is gone
# 3 is gone
# == Lord 3.186117172241211

Verification 2

# Multiple subprocesses use join
from multiprocessing import Process
import time


def task(name,sec):
    print(f'{name}is running')
    time.sleep(sec)
    print(f'{name} is gone')


if __name__ == '__main__':
    start_time = time.time()
    p1 = Process(target=task,args=('1',3))
    p2 = Process(target=task,args=('2',2))
    p3 = Process(target=task,args=('3',1))
    p1.start()
    p2.start()
    p3.start()
    p1.join() #p1 is a blockage that does not take place until a week later.
    print(f'==Main 1-{time.time() - start_time}')
    p2.join()
    print(f'==Main 2-{time.time() - start_time}')
    p3.join()
    print(f'==Main 3-{time.time()-start_time}')
# 1is running
# 2is running
# 3is running
# 3 is gone
# 2 is gone
# 1 is gone
# == Main 1-3.152977705001831
# == Main 2-3.152977705001831
# == Main 3-3.152977705001831

Optimize the above code

from multiprocessing import Process
import time


def task(name,sec):
    print(f'{name}is running')
    time.sleep(sec)
    print(f'{name} is gone')


if __name__ == '__main__':
    start_time = time.time()

    l1 = []
    for i in range(1,4):
        p=Process(target=task,args=("zbb",1))
        l1.append(p)
        p.start()
    for i in l1:
        i.join()
    print(f'==main{time.time() - start_time}')
    print(l1)
# zbbis running
# zbbis running
# zbbis running
# zbb is gone
# zbb is gone
# zbb is gone
# == Main 1.1665570735931396
# [<Process(Process-1, stopped)>, <Process(Process-2, stopped)>, <Process(Process-3, stopped)>]

Join is blocking. The main process has join. The code under the main process is not executed until the process has finished executing.

3.5 Other Properties of Process Objects (Understanding)

from multiprocessing import Process
import time

def task(name):
    print(f'{name} is running')
    time.sleep(2)
    print(f'{name} is gone')



if __name__ == '__main__':
    p = Process(target=task,args=('cc',),name='aaa')  # Create a process object
    print(p.name) #Name the child process
    p.start()
    # time.sleep(1)
    p.terminate()  #Kill the child process
    time.sleep(0.5) #Sleep for a while or you'll be alive.
    print(p.is_alive())#Judging whether a child process is alive or not 
    # p.name = 'sb' #Name of the subprocess

    print('==Main Start')

3.6 Zombie and Orphan Processes (Understanding)

unix-based environment (linux,macOS)

Normal: The main process needs to wait for the child process to finish before the main process ends.

The main process monitors the running status of the sub-process at all times. When the sub-process is finished, the sub-process is recovered within a period of time.

Why doesn't the main process reclaim the child process immediately after it's finished?

1. The main process and the child process are asynchronous. The main process cannot immediately capture when the child process ends.
2. If the resource is released in memory immediately after the end of the child process, the main process will not be able to monitor the status of the child process.

unix provides a mechanism for solving the above problems.

After all the sub-processes are finished, the operation links of files and most of the data in memory will be released immediately, but some contents will be retained: process number, end time, running status, waiting for the main process to monitor and reclaim.

After all the sub-processes are finished, they will enter the zombie state before they are reclaimed by the main process.

First: Zombie Process (Harmful)
Botnet process: A process uses fork to create a child process. If the child process exits and the parent process does not call wait or waitpid to get the status information of the child process, then the process descriptor of the child process is still stored in the system. This process is called a deadlock process.

If the parent process does not wait/wait pid the botnet process, a large number of Botnet processes will be generated, which will occupy memory and process pid number.

How to solve the zombie process???

The parent process produces a large number of child processes, but does not recycle, so a large number of zombie processes will be formed. The solution is to kill the parent process directly, turn all zombie processes into orphan processes, and recycle them by init.

II: Orphan process (harmless)

Orphan processes: When a parent process exits and one or more of its children are still running, those children will become orphan processes. The orphan processes will be adopted by the init process (process number is 1), and the init process will collect their status.

4. Daemon process

The sub-process guards the main process, and as long as the main process ends, the sub-process ends.

from multiprocessing import Process
import time

def task(name):
    print(f'{name} is running')
    time.sleep(2)
    print(f'{name} is gone')



if __name__ == '__main__':
    # In windows environment, the opening process must be under _name_=='_main_'.
    p = Process(target=task,args=('Chang Xin',))  # Create a process object
    p.daemon = True  # Set the p subprocess to a daemon, and the daemon will end as soon as the main process is finished.
    p.start()
    time.sleep(1)
    print('===main')

5. Mutex Lock (Process Synchronization Control)

When multiple users grab a resource, the first user grabs it first, adds a lock, and then uses it for the second user.

problem

Three colleagues print content with one printer at the same time.

Three processes simulate three colleagues, and the output platform simulates the printer.

#Version 1:
#Concurrent is efficiency first, but at present our demand is order first.
#When multiple processes strengthen a resource together, the priority should be guaranteed: serial, one by one.

from multiprocessing import Process
import time
import random
import os

def task1(p):
    print(f'{p}It's starting to print.')
    time.sleep(random.randint(1,3))
    print(f'{p}Printing is over')

def task2(p):
    print(f'{p}It's starting to print.')
    time.sleep(random.randint(1,3))
    print(f'{p}Printing is over')


if __name__ == '__main__':

    p1 = Process(target=task1,args=('p1',))
    p2 = Process(target=task2,args=('p2',))


    p2.start()
    p2.join()
    p1.start()
    p1.join()

#We use join to solve the serial problem and ensure that the order is first, but the first one is fixed.
#This is unreasonable. When you are competing for the same resource, you should first come, first served and ensure fairness.

from multiprocessing import Process
from multiprocessing import Lock
import time
import random
import os

def task1(p,lock):
    '''
    //A lock cannot be locked twice in a row
    '''
    lock.acquire()
    print(f'{p}It's starting to print.')
    time.sleep(random.randint(1,3))
    print(f'{p}Printing is over')
    lock.release()

def task2(p,lock):
    lock.acquire()
    print(f'{p}It's starting to print.')
    time.sleep(random.randint(1,3))
    print(f'{p}Printing is over')
    lock.release()

if __name__ == '__main__':

    mutex = Lock()
    p1 = Process(target=task1,args=('p1',mutex))
    p2 = Process(target=task2,args=('p2',mutex))

    p2.start()
    p1.start()

The difference between lock and join.

Common point: It can turn concurrency into serialization, which guarantees the order.

Differences: join artificially sets the order, lock lets it contend for the order, guarantees fairness.

6. Communication between processes

1. File-based Communication

# Ticket snatching system.
# First you can check the tickets. Query the number of remaining tickets. Concurrent
# Purchase, send requests to the server, the server receives requests, in the back end the number of tickets - 1, back to the front end. Serial.
# When multiple processes strengthen a single data, if data security is to be guaranteed, it must be serialized.
# If we want to serialize the purchase process, we must lock it.


from multiprocessing import Process
from multiprocessing import Lock
import json
import time
import os
import random


def search():
    time.sleep(random.randint(1,3))  # Analog Network Delay (Query Link)
    with open('ticket.json',encoding='utf-8') as f1:
        dic = json.load(f1)
        print(f'{os.getpid()} Check the number of tickets,Surplus{dic["count"]}')


def paid():
    with open('ticket.json', encoding='utf-8') as f1:

        dic = json.load(f1)
    if dic['count'] > 0:
        dic['count'] -= 1
        time.sleep(random.randint(1,3))  # Analog network latency (purchase link)
        with open('ticket.json', encoding='utf-8',mode='w') as f1:
            json.dump(dic,f1)
        print(f'{os.getpid()} Successful Purchase')


def task(lock):
    search()
    lock.acquire()
    paid()
    lock.release()

if __name__ == '__main__':
    mutex = Lock()
    for i in range(6):
        p = Process(target=task,args=(mutex,))
        p.start()


# When many processes have a single resource (data), you need to ensure that the sequence (data security) must be serial.
# Mutual exclusion lock: guarantees the order of fairness and the security of data.

# Communication between file-based processes:
    # Low efficiency.
    # Self-locking is troublesome and deadlock prone.

2. Queue-based communication

Processes are isolated from each other. To achieve interprocess communication (IPC), the multiprocessing module supports two forms: queue and pipeline, both of which use messaging.

Queue: Understand the queue as a container that can hold some data.

Queue characteristics: FIFO always keeps this data.

Classes that create queues (the underlying implementation is pipeline and locking):

1 Queue([maxsize]): Create shared process queues. Queue is a multi-process secure queue that can be used to transfer data between multiple processes.

Introduction of parameters:

1 maxsize is the maximum number of permissible items in the queue, and ellipsis has no size limitation.

Main methods:

The q.put method is used to insert data into the queue, and the put method has two optional parameters: blocked and timeout. If blocked is True (default value) and timeout is positive, this method blocks the time specified by timeout until the queue has space left. If timeout occurs, a Queue.Full exception is thrown. If the blocked is False, but the Queue is full, the Queue.Full exception is thrown immediately.

The q.get method can read from the queue and delete an element.
Similarly, the get method has two optional parameters: blocked and timeout. If blocked is True (default value) and timeout is positive, then no element is fetched during the waiting time, and a Queue.Empty exception is thrown.
If blocked is False, there are two situations. If a value of Queue is available, the value is returned immediately. Otherwise, if the queue is empty, the Queue.Empty exception is thrown immediately.

q.get_nowait():with q.get(False)
q.put_nowait():with q.put(False)

q.empty(): If q is empty when this method is called, it returns True, which is unreliable, for example, if an item is added to the queue in the process of returning True.
q.full(): When this method is called, q returns to True when it is full. This result is unreliable, for example, if the item in the queue is removed during the process of returning to True.
9 q.qsize(): The correct number of items in the return queue is unreliable for the same reason as q.empty() and q.full().

Other methods (understanding):

q.cancel_join_thread(): The background thread is not automatically connected when the process exits. Prevent the join_thread() method from blocking
 q.close(): Close the queue to prevent more data from being added to the queue. Calling this method, the background thread will continue to write data that has been queued but has not yet been written, but will close as soon as this method completes. If q is garbage collected, this method is called. Closing the queue does not produce any type of end-of-data signal or exception in the queue user. For example, if a user is blocking a get() operation, closing the queue in the producer will not cause the get() method to return an error.
q.join_thread(): The background thread that connects the queue. This method is used to wait for all queue items to be consumed after calling the q.close() method. By default, this method is called by all processes that are not the original creators of Q. Calling the q.cancel_join_thread method can prohibit this behavior

from multiprocessing import Queue
q = Queue(3)  # maxsize

q.put(1)
q.put('alex')
q.put([1,2,3])
# q.put(5555,block=False) # When the queue is full, put data is blocked in the process.
#
print(q.get())
print(q.get())
print(q.get())
print(q.get(timeout=3))  # Blocked for 3 seconds and then blocked for direct error reporting.
# print(q.get(block=False)) # When the data is fetched, the process get data will also be blocked until a process put data.

# block=False will report an error whenever it encounters a blocking.

3. Pipeline-based

Pipeline is a problem. Pipeline can cause data insecurity. The official explanation is that pipeline may cause data damage.

7. Producers and consumers

Using producer and consumer patterns in concurrent programming can solve most concurrent problems. This model improves the overall data processing speed of the program by balancing the workability of the production line and the consuming threads.

1. Why use the producer-consumer model

In the threaded world, producers are threads of production data and consumers are threads of consumption data. In multi-threaded development, if the producer processes quickly and the consumer processes slowly, then the producer must wait for the consumer to finish processing before continuing to produce data. In the same way, if the consumer's processing power is greater than that of the producer, then the consumer must wait for the producer. In order to solve this problem, producer and consumer models are introduced.

2. What is the producer-consumer model?

The producer-consumer model solves the strong coupling problem between producer and consumer through a container. Producers and consumers do not communicate directly with each other, but through blocking queues to communicate, so producers do not have to wait for consumers to process the data after production, and throw it directly to the blocking queue. Consumers do not look for producers to ask for data, but directly from the blocking queue, blocking queue is equivalent to a buffer. It balances the processing power of producers and consumers.

It serves as a buffer, balances productivity and consumption, and decouples them.

from multiprocessing import Process
from multiprocessing import Queue
import time
import random

def producer(q,name):
    for i in range(1,6):
        time.sleep(random.randint(1,2))
        res = f'{i}Steamed bun'
        q.put(res)
        print(f'Producer{name} Produced{res}')


def consumer(q,name):

    while 1:
        try:
            food = q.get(timeout=3)
            time.sleep(random.randint(1, 3))
            print(f'\033[31;0m Consumer{name} Ate{food}\033[0m')
        except Exception:
            return


if __name__ == '__main__':
    q = Queue()

    p1 = Process(target=producer,args=(q,'Passerby A'))
    p2 = Process(target=consumer,args=(q,'Passerby B'))

    p1.start()
    p2.start()

2. Multithreading of Concurrent Programming

1. Introduction to threading module

Multiprocessing module imitates the interface of threading module completely. They are similar in use level, so they are not introduced in detail.

Official explanation https://docs.python.org/3/library/threading.html?highlight=threading#

2. Two Ways to Open Threads

from threading import Thread
import time

def task(name):
    print(f"{name} is runing!")
    time.sleep(2)
    print(f"{name} is  stop")

t1 =Thread(target=task,args=("mysql",))
t1.start()
print('===Main thread')  # Threads are not primary or secondary (for good memory)

The second kind of understanding is OK (not often used)

from threading import Thread 
import time

class MyThread(Thread):
    def __init__(self,name):
        super().__init__()
        self.name = name

    def run(self):
        print(f'{self.name} is running!')
        time.sleep(1)
        print(f'{self.name} is stop')


t1 = MyThread("mysql")
t1.start()
print('main')

3. Code comparison of thread vs process

1. The Difference of Opening Speed between Multithread and Multiprocess

Multiprocess

Execute the main process first

from multiprocessing import Process


def work():
    print('hello')

if __name__ == '__main__':
    #Open threads under the main process
    t=Process(target=work)
    t.start()
    print('Main thread/Main process')

Multithreading

Execute subthreads first

from threading import Thread
import time

def task(name):
    print(f'{name} is running')
    time.sleep(1)
    print(f'{name} is gone')

if __name__ == '__main__':

    t1 = Thread(target=task,args=('A',))
    t1.start()
    print('===Main thread')

2. Contrast pid

process

from multiprocessing import Process
import time
import os
def task(name):
    print(f'Subprocesses: {os.getpid()}')
    print(f'Main process: {os.getppid()}')


if __name__ == '__main__':

    p1 = Process(target=task,args=('A',))  # Create a process object
    p2 = Process(target=task,args=('B',))  # Create a process object
    p1.start()
    p2.start()
    print(f'==main{os.getpid()}')
# == Lord 17444
# Subprocess: 8636
# Main process: 17444
# Subprocess: 14200
# Main process: 17444

thread

resource sharing

from threading import Thread
import os

def task():
    print(os.getpid())

if __name__ == '__main__':

    t1 = Thread(target=task)
    t2 = Thread(target=task)
    t1.start()
    t2.start()
    print(f'===Main thread{os.getpid()}')
# 18712
# 18712
# === Main thread 18712

3. Threads in the same process share internal data

from threading import Thread
import os

x = 3
def task():
    global x
    x = 100

if __name__ == '__main__':

    t1 = Thread(target=task)
    t1.start()
    # t1.join()
    print(f'===Main thread{x}')

# Resource data in the same process is shared among multiple threads of the process.

4. Other methods of threading

Method of Thread Instance Object

  # isAlive(): Returns whether the thread is active.
  # getName(): Returns the thread name.
  # setName(): Set the thread name.

Some methods provided by threading module:
  # threading.currentThread(): Returns the current thread variable.
  # threading.enumerate(): Returns a list of running threads. Running refers to threads that start and end, excluding threads that start and terminate.
  # threading.activeCount(): Returns the number of threads running with the same results as len(threading.enumerate()).

from threading import Thread
from threading import currentThread
from threading import enumerate
from threading import activeCount
import os
import time

x = 3
def task():
    # print(currentThread())
    time.sleep(1)
    print('666')
print(123)
if __name__ == '__main__':

    t1 = Thread(target=task,name='Thread 1')
    t2 = Thread(target=task,name='Thread 2')
    # name Sets Thread name
    t1.start()
    t2.start()
    # time.sleep(2)
    print(t1.isAlive())  # Determine whether threads are alive
    print(t1.getName())  # Get the thread name
    t1.setName('Child Thread-1')
    print(t1.name)  # Get the thread name***

    # threading method
    print(currentThread())  # Get the object of the current thread
    print(enumerate())  # Returns a list of all threaded objects
    print(activeCount())  # *** Returns the number of threads running.
    print(f'===Main thread{os.getpid()}')

5. Usage in join threads

join: Blocking tells the main thread to wait until my sub-thread has finished executing before executing the main thread

from threading import Thread
import time

def task(name):
    print(f'{name} is running')
    time.sleep(1)
    print(f'{name} is gone')

if __name__ == '__main__':
    start_time = time.time()
    t1 = Thread(target=task,args=('A',))
    t2 = Thread(target=task,args=('B',))

    t1.start()
    t1.join()
    t2.start()
    t2.join()
    print(f'===Main thread{time.time() - start_time}')

6. Daemon threads

It should be emphasized that running out is not terminating

# 1. For the main process, run-through refers to run-out of the main process code.

# 2. For the main thread, run-through means that all non-daemon threads in the process where the main thread is running-out, and the main thread is running-out.

If the lifetime of the daemon thread is smaller than that of other threads, then it must end first, or wait for the other non-daemon threads and the main thread to end.

Explain in detail:

# 1 The main process runs out after its code has finished (the daemon is reclaimed at this time), and then the main process will wait until all the non-daemon sub-processes have run out to reclaim the resources of the sub-process (otherwise a zombie process will occur) before it ends.

# 2 The main thread runs only after the other non-daemon threads have finished running (daemon threads are reclaimed at this time). Because the end of the main thread means the end of the process, the whole resource of the process will be recycled, and the process must ensure that all non-daemon threads run before it can end.

from threading import Thread
import time


def sayhi(name):
    print('You roll!')
    time.sleep(2)
    print('%s say hello' %name) #The main thread ends without execution

if __name__ == '__main__':
    t = Thread(target=sayhi,args=('A',))
    # t.setDaemon(True) #You must set it before t.start()
    t.daemon = True
    t.start()  # Threads start much faster than processes

    print('Main thread')
# Get out of here!
# Main thread

When does the main thread end???

The daemon thread waits for the non-daemon sub-thread and the main thread to finish.

from threading import Thread
import time

def foo():
    print(123)  # 1
    time.sleep(3)  
    print("end123")   #The wait time is too long, the child process has been executed, and the daemon has been suspended.

def bar():
    print(456)  # 2
    time.sleep(1)
    print("end456")  # 4


t1=Thread(target=foo)
t2=Thread(target=bar)

t1.daemon=True
t1.start()
t2.start()
print("main-------")  # 3

# 123
# 456
# main-------
# end456

7. Mutex Lock

Multi-threaded synchronization lock and multi-process synchronization lock is a truth, that is, when multiple threads preempt the same data (resources), we need to ensure the security of data, reasonable order.

Multiple tasks grab one data,The purpose of ensuring data security,Make it serial

from threading import Thread
from threading import Lock
import time
import random
x = 100

def task(lock):

    lock.acquire()
    # time.sleep(random.randint(1,2))
    global x
    temp = x
    time.sleep(0.01)
    temp = temp - 1
    x = temp
    lock.release()


if __name__ == '__main__':
    mutex = Lock()
    l1 = []
    for i in range(100):
        t = Thread(target=task,args=(mutex,))
        l1.append(t)
        t.start()

    time.sleep(3)
    print(f'Main thread{x}')

8. Deadlock Phenomenon and Recursive Lock

Processes also have deadlocks and recursive locks. Process deadlocks and recursive locks are the same as thread deadlocks.

from threading import Thread
from threading import Lock
import time

lock_A = Lock()
lock_B = Lock()


class MyThread(Thread):

    def run(self):
        self.f1()
        self.f2()


    def f1(self):

        lock_A.acquire()
        print(f'{self.name}I got it. A lock')

        lock_B.acquire()
        print(f'{self.name}I got it. B lock')

        lock_B.release()

        lock_A.release()

    def f2(self):

        lock_B.acquire()
        print(f'{self.name}I got it. B lock')

        time.sleep(0.1)
        lock_A.acquire()
        print(f'{self.name}I got it. A lock')

        lock_A.release()

        lock_B.release()



if __name__ == '__main__':

    for i in range(3):
        t = MyThread()
        t.start()

Recursive lock has a counting function, the original number is 0, the last lock, count + 1, release a lock, count - 1

As long as the number above the recursive lock is not zero, other threads cannot lock.

#Recursive locks can solve deadlock problem. When business needs multiple locks, recursive locks should be considered first.
class MyThread(Thread):

    def run(self):
        self.f1()
        self.f2()

    def f1(self):

        lock_A.acquire()
        print(f'{self.name}I got it. A lock')

        lock_B.acquire()
        print(f'{self.name}I got it. B lock')

        lock_B.release()

        lock_A.release()

    def f2(self):

        lock_B.acquire()
        print(f'{self.name}I got it. B lock')

        lock_A.acquire()
        print(f'{self.name}I got it. A lock')

        lock_A.release()

        lock_B.release()

if __name__ == '__main__':

    for i in range(3):
        t = MyThread()
        t.start()

9. Signals

It's also a lock that controls the number of concurrencies

The same as the process

Semaphore manages a built-in counter.
Built-in counter-1 whenever acquire() is called;
Built-in counter + 1 when release() is called;
The counter must not be less than 0; when the counter is 0, acquire() blocks the thread until release() is called by other threads.

Example: (At the same time, only five threads can obtain semaphore, which can limit the maximum number of connections to 5):

from threading import Thread, Semaphore, current_thread
import time
import random
sem = Semaphore(5) #The toilet has only five five places out of which one enters the other.

def task():
    sem.acquire()

    print(f'{current_thread().name} Toilet')
    time.sleep(random.randint(1,3))

    sem.release()


if __name__ == '__main__':
    for i in range(20):
        t = Thread(target=task,)
        t.start()

10.Python GIL

GIL Global Interpreter Lock

Many self-proclaimed gods say that the GIL lock is python's fatal flaw, Python can not be multi-core, concurrent, etc..

This article gives a thorough analysis of the impact of GIL on python multithreading.

It is strongly recommended to take a look at: http://www.dabeaz.com/python/UnderstandingGIL.pdf

Why lock?

At that time, it was a single-core era, and the price of cpu was very expensive.
If there is no global interpreter lock, the programmer who develops Cpython interpreter will actively lock, unlock, troublesome, deadlock phenomena and so on in the source code. In order to save time, he directly enters the interpreter and adds a lock to the thread.

Advantages: It guarantees the security of data resources of Cpython interpreter.

Disadvantage: Multi-threading of a single process can't take advantage of multi-core.

Jpython has no GIL lock.

pypy also does not have GIL locks.

Now in the multi-core era, can I remove Cpython's GIL lock?

Because all the business logic of the Cpython interpreter is implemented around a single thread, it is almost impossible to remove the GIL lock.

Multi-threading of a single process can be concurrent, but it can't take advantage of multi-core and can't be parallel.

Multiple processes can be concurrent and parallel.

IO-intensive

Computing intensive

11. The Difference between GIL and lock Locks

Similarity: All locks are identical, mutually exclusive.

Difference:

GIL locks global interpreter locks to protect the security of resource data within the interpreter.

GIL locks and releases without manual operation.

The mutex defined in your code protects the security of resource data in the process.

Self-defined mutexes must be manually locked and unlocked.

Detailed contact

Because the Python interpreter helps you reclaim memory automatically and regularly, you can understand that there is a separate thread in the python interpreter. Every time it wakes up to do a global poll to see which memory data can be emptied. At this time, the threads in your own program and the threads of the py interpreter itself are concurrent. Running, assuming that your thread deletes a variable, the clearing time of the garbage collection thread of the py interpreter in the process of clearing this variable may be that another thread just reassigns the memory space that has not yet been cleared, and the result may be that the newly assigned data is deleted, in order to solve the problem. Similarly, the Python interpreter simply and crudely locks a thread when it runs and nobody else can move, which solves the above problem, which can be said to be a legacy of earlier versions of Python.

12. Verify the efficiency of compute-intensive IO-intensive

io-intensive:

from threading import Thread
from multiprocessing import Process
import time
import random

def task():
    count = 0
    time.sleep(random.randint(1,3))
    count += 1

if __name__ == '__main__':

#Multi-process concurrency and parallelism
    start_time = time.time()
    l1 = []
    for i in range(50):
        p = Process(target=task,)
        l1.append(p)
        p.start()

    for p in l1:
        p.join()

    print(f'Executive efficiency:{time.time()- start_time}')  # 4.41826057434082

#Multithread concurrency
    start_time = time.time()
    l1 = []
    for i in range(50):
        p = Thread(target=task,)
        l1.append(p)
        p.start()

    for p in l1:
        p.join()

    print(f'Executive efficiency:{time.time()- start_time}')  # 3.0294392108917236

# For IO-intensive: multi-threaded concurrency of a single process is efficient.

Computing intensive:

from threading import Thread
from multiprocessing import Process
import time


def task():
    count = 0
    for i in range(10000000):
        count += 1


if __name__ == '__main__':

    #Multi-process concurrency and parallelism
    start_time = time.time()
    l1 = []
    for i in range(4):
        p = Process(target=task,)
        l1.append(p)
        p.start()

    for p in l1:
        p.join()

    print(f'Executive efficiency:{time.time()- start_time}')  # 1.1186981201171875

    #Multithread concurrency
    start_time = time.time()
    l1 = []
    for i in range(4):
        p = Thread(target=task,)
        l1.append(p)
        p.start()

    for p in l1:
        p.join()

    print(f'Executive efficiency:{time.time()- start_time}')  # 2.729006767272949

# Summary: Computing intensive: multi-process concurrent and parallel efficiency.

13. Multithread socket Communication

Whether it is multi-threaded or multi-process, if according to a client request, I will open a thread, a request to open a thread.

It should be like this: within your computer's permission, the more threads you open, the better.

Server

import socket
from threading import Thread

def communicate(conn,addr):
    while 1:
        try:
            from_client_data = conn.recv(1024)
            print(f'From the client{addr[1]}News: {from_client_data.decode("utf-8")}')
            to_client_data = input('>>>').strip()
            conn.send(to_client_data.encode('utf-8'))
        except Exception:
            break
    conn.close()



def _accept():
    server = socket.socket()

    server.bind(('127.0.0.1', 8848))

    server.listen(5)

    while 1:
        conn, addr = server.accept()
        t = Thread(target=communicate,args=(conn,addr))
        t.start()

if __name__ == '__main__':
    _accept()

Client

import socket

client = socket.socket()

client.connect(('127.0.0.1',8848))

while 1:
    try:
        to_server_data = input('>>>').strip()
        client.send(to_server_data.encode('utf-8'))

        from_server_data = client.recv(1024)
        print(f'Messages from the server: {from_server_data.decode("utf-8")}')

    except Exception:
        break
client.close()

14. process pool, thread pool

Why put the process pool and thread pool together in order to unify the use of threadPollExecutor and ProcessPollExecutor in the same way, and as long as you import through this concurrent.futures, you can use both of them directly

Thread pool: A container that limits the number of threads you open, such as four

For the first time, it must be able to handle only four tasks concurrently. As long as a task is completed, the thread will take over the next task immediately.

Time for space.

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import os
import time
import random

def task(n):
    print(f'{os.getpid()} Receive visitors')
    time.sleep(random.randint(1,3))


if __name__ == '__main__':

    # Open the process pool (parallel (parallel + concurrent)
    p = ProcessPoolExecutor()  # By default, the number of processes in the process pool is equal to the number of CPUs

    for i in range(20): #Publish 20 tasks, process processing of the number of CPUs
        p.submit(task,i)

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import os
import time
import random

def task(n):
    print(f'{os.getpid()} Receive visitors')
    time.sleep(random.randint(1,3))


if __name__ == '__main__':

    # Open the process pool (parallel (parallel + concurrent)
    p = ProcessPoolExecutor()  # By default, the number of processes in the process pool is equal to the number of CPUs

    for i in range(20): #Publish 20 tasks, process processing of the number of CPUs
        p.submit(task,i)

15. Blocking, non-blocking, synchronous, asynchronous

Perspective of implementation

Blocking: When the program runs, it encounters IO and the program hangs CPU is cut off.

Non-blocking: The program does not encounter IO, the program encounters IO, but through some means, let the cpu run my program forcibly.

Task submission Perspective

Synchronization: Submit a task, from the start of the task until the end of the task (possibly IO), after returning a return value, I submit the next task.

Asynchronism: Submit multiple tasks at once, and then I execute the next line of code directly.

16. Synchronized call, asynchronous call

Asynchronous call

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import time
import random
import os

def task(i):
    print(f'{os.getpid()}Start the Mission')
    time.sleep(random.randint(1,3))
    print(f'{os.getpid()}End of Mission')
    return i
if __name__ == '__main__':

    # Asynchronous call
    pool = ProcessPoolExecutor()
    for i in range(10):
        pool.submit(task,i)

    pool.shutdown(wait=True)
    # shutdown: Let my main process wait for all the subprocesses in the process pool to finish their tasks and execute. It's a bit like join.
    # shutdown: New tasks are not allowed until all tasks have been completed in the last process pool.
    # A task is accomplished by a function whose return value is the return value of the function.
    print('===main')

Synchronized invocation

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import time
import random
import os

def task(i):
    print(f'{os.getpid()}Start the Mission')
    time.sleep(random.randint(1,3))
    print(f'{os.getpid()}End of Mission')
    return i
if __name__ == '__main__':

    # Synchronized invocation
    pool = ProcessPoolExecutor()
    for i in range(10):
        obj = pool.submit(task,i)
        # An obj is a dynamic object that returns the state of the current object, which may be running, ready to block, or terminated.
        # obj.result() must wait until the task is completed and the result is returned to perform the next task.
        print(f'Task results:{obj.result()}')

    pool.shutdown(wait=True)
    print('===main')

17. Asynchronous call + callback function

The browser works by sending a request to the server, which verifies your request and returns a file to your browser if it is correct.
The browser receives the file and renders the code inside the file as beautiful and beautiful as you can see.

What is a reptile?

Using code to simulate a browser, the browser's workflow is carried out to get a pile of source code.
Clean the source code to get the data I want.

pip install requests

import requests
ret = requests.get('http://www.baidu.com')
if ret.status_code == 200:
    print(ret.text)

Based on the asynchronous call to reclaim the results of all tasks, I want to reclaim the results in real time.
The concurrent execution of each task only handles IO blocking, and can not add new functions.
Asynchronous call + callback function

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import requests


def task(url):
    '''The simulation is that crawling multiple source code must have IO operation'''
    ret = requests.get(url)
    if ret.status_code == 200:
        return ret.text


def parse(obj):
    '''Analog analysis of data is generally not available. IO'''
    print(len(obj.result()))


if __name__ == '__main__':

    # Open thread pool, concurrent and parallel execution
    url_list = [
        'http://www.baidu.com',
        'http://www.JD.com',
        'http://www.JD.com',
        'http://www.JD.com',
        'http://www.taobao.com',
        'https://www.cnblogs.com/jin-xin/articles/7459977.html',
        'https://www.luffycity.com/',
        'https://www.cnblogs.com/jin-xin/articles/9811379.html',
        'https://www.cnblogs.com/jin-xin/articles/11245654.html',
        'https://www.sina.com.cn/',

    ]
    pool = ThreadPoolExecutor(4)

    for url in url_list:
        obj = pool.submit(task, url)
        obj.add_done_callback(parse) #callback

    '''
    //Thread pool set up 4 threads, asynchronously initiate 10 tasks, each task is to obtain the source code through the web page, and concurrently execute.
    //When a task is completed, the parse analysis code task is left to the remaining idle threads, which continue to process other tasks.
    //If the process pool + callback: the callback function is executed by the main process.
    //If the thread pool + callback: the return function is executed by the idle thread.
    '''

Is asynchronous callback the same thing?
Asynchronism stands at the point of view of publishing tasks.
From the point of view of receiving results, the callback function receives the results of each task in sequence and proceeds to the next step.

Asynchronous + callback:
IO type of asynchronous processing.
Callback processing non-IO

18. Thread queue queque

1. First in first out

import  queue
#FIFO
q  = queue.Queue(3)
q.put(1)
q.put(2)
q.put(3)

print(q.get())
print(q.get())
print(q.get())
# print(q.get(block=False))#Direct error reporting in case of blocking
q.get(timeout=3) #Blocking 2 seconds without value directly reporting error

2. Last in, first out, stack

q =queue.LifoQueue(4)
q.put(1)
q.put(2)
q.put(3)
q.put(4)
print(q.get())
print(q.get())
print(q.get())
print(q.get())

3. Priority queue

q = queue.PriorityQueue(4)
q.put((5,"z"))
q.put((0,"b"))
q.put((-1,"2"))
q.put((-1,"3"))
#The same level does not generally exist according to accik code
print(q.get())
print(q.get())
print(q.get())
print(q.get())

19. event events

Open two threads, one thread runs to a certain stage in the middle, triggering another thread to execute. The two threads increase coupling.

from threading import  Thread,current_thread,Event
import time

event = Event()
def check():
    print(f"{current_thread().name}Check whether the server is open")
    time.sleep(3)
    # print(event.is_set())#Judging whether set exists
    event.set()
    # print(event.is_set())#Display T after set
    print('The server is on')

def connect():
    print(f'{current_thread().name}Waiting for connection..')
    event.wait() #After blocking knows event.set() is executed
    # event.wait(1)#Blocking only 1 second, execution after 1 second
    print(f"{current_thread().name}Successful connection")

t1 = Thread(target=check,)
t2 = Thread(target=connect,)
t1.start()
t2.start()

20. Co-operation

COOPERATION: It is concurrent under single thread, also known as microthreading, fibre. The English name is Coroutine. One sentence explains what threads are: a coroutine is a user-mode lightweight thread, that is, the coroutine is scheduled by the user program itself.

Single cpu: 10 tasks, let you concurrently execute this 10 tasks for me:

1. Mode 1: Open multi-process concurrent execution, operating system switching + holding state.
2. Mode 2: Open multithreading and execute concurrently, and switch the operating system + keep state.
3. Mode 3: Open the concurrent execution of the program, and control the cpu to switch back and forth between the three tasks + keep state.

Explain 3 in detail: The coprocess switches very fast, blinding the operating system's eyes, so that the operating system thinks that the cpu has been running your thread (coprocess.)

It should be emphasized that:

# 1. python's threads are at the kernel level, i.e. they are scheduled under the control of the operating system (for example, if a single thread encounters io or if the execution time is too long, it will be forced to surrender the cpu execution privileges and switch other threads to run).
# 2. Open a process within a single thread, once you encounter io, you will control switching from application level (rather than operating system) to improve efficiency (!!! The switching of non-io operations is independent of efficiency.

Contrasting with the switching of operating system control threads, users control the switching of the consortium within a single thread

The advantages are as follows:

# 1. The switching overhead of the protocol is less, it belongs to the program level switching, and the operating system is completely imperceptible, so it is more lightweight.
# 2. The concurrent effect can be achieved within a single thread, and the maximum use of cpu can be achieved.

The shortcomings are as follows:

# 1. The essence of a coroutine is that it can't use multiple cores in a single thread. It can be a program that opens multiple processes, multiple threads in each process, and coroutines in each thread.

Summarize the characteristics of the project:

Concurrency must be implemented in only one single thread
Modifying shared data without locking
The context stack of multiple control flows is saved by the user program itself

1.Greenlet

If we have 20 tasks in a single thread, it's too cumbersome to use the yield generator to switch between multiple tasks (we need to initialize the generator once, then call send...). Very cumbersome), and using the greenlet module can easily switch these 20 tasks directly

#install
pip3 install greenlet

# Switching + Holding State (No Active Switching When IO Encounters)
#The real collaboration module is the handover using greenlet
from greenlet import greenlet

def eat(name):
    print('%s eat 1' %name)  #2
    g2.switch('taibai')   #3
    print('%s eat 2' %name) #6
    g2.switch() #7
def play(name):
    print('%s play 1' %name) #4
    g1.switch()      #5
    print('%s play 2' %name) #8

g1=greenlet(eat)
g2=greenlet(play)

g1.switch('taibai')#You can pass in parameters at the first switch, and you don't need 1 later.

At work:

In general, we are process + thread + co-process to achieve concurrency in order to achieve the best concurrency effect. If it is a 4-core cpu, there are usually five processes, 20 threads in each process (5 times the number of cpus), each thread can start 500 co-processes, waiting for network latency when crawling pages on a large scale. In time, we can use the concurrency process to achieve concurrency. The number of concurrencies = 5 * 20 * 500 = 50,000 concurrencies, which is the largest number of concurrencies for a general 4 CPU machine. The maximum load-carrying capacity of nginx in load balancing is 5w

The code of these 20 tasks in a single thread usually has both computational and blocking operations. We can use blocking time to execute task 2 when we encounter blocking in task 1. Only in this way can we improve efficiency, which uses the Gevent module.

2.Gevent

#install
pip3 install gevent

Gevent is a third-party library that can easily implement concurrent synchronous or asynchronous programming through gevent. The main mode used in gevent is Greenlet, which is a lightweight protocol that accesses Python in the form of C extension module. Greenlet runs entirely within the main operating system processes, but they are scheduled collaboratively.

usage

g1=gevent.spawn(func,1,2,3,x=4,y=5)Create a collaboration object g1，spawn The first parameter in parentheses is the function name, such as eat，There can be multiple parameters later, either positional or keyword arguments, which are passed to the function. eat , spawn Is an asynchronous submission task

g2=gevent.spawn(func2)

g1.join() #Waiting for g1 to end

g2.join() #Waiting for G2 to finish someone's test will find that you can do G2 without writing a second join. Yes, the coordinator will help you switch the execution, but you will find that if the tasks in G2 take a long time to execute, but if you don't write a join, you won't finish the remaining tasks waiting for g2.


#Or the two steps above: gevent.joinall([g1,g2])

g1.value#Get the return value of func1

Final version

import gevent
from gevent import monkey
monkey.patch_all()  # Patch: Mark the blockages of all the tasks below
def eat(name):
    print('%s eat 1' %name)
    time.sleep(2)
    print('%s eat 2' %name)

def play(name):
    print('%s play 1' %name)
    time.sleep(1)
    print('%s play 2' %name)


g1 = gevent.spawn(eat,'egon')
g2 = gevent.spawn(play,name='egon')

# g1.join()
# g2.join()
gevent.joinall([g1,g2])

Topics: Python socket Programming JSON

Programmer Think