On asynchronous programming multithreading

Posted by kyberfabrikken on Thu, 04 Nov 2021 21:09:11 +0100

Multithreading in Python needs to import the threading package.

Threads have been introduced in the operating system, so I won't elaborate too much. Directly through the code demonstration.

How do I create a thread?

import threading
def fun():
    pass
t1 = threading.Thread(target=fun)
print(t1)

Output:

<Thread(Thread-1, initial)>

It can be seen that the creation of a thread does not need to write a coroutine function like a coroutine. fun is an ordinary function.

The output t1 is the thread object, that is, the thread object. Thread-1 in the back is the name of the collaborative process. Initial is the state of the collaboration process. Because the collaboration process has not started running, the state is the initial state.

The thread is created with the threading.Thread() method. The thread method contains many parameters. Make a brief introduction.

class Thread(group=None, target=None, name=None, args=(), kwargs={},daemon=None)

group uses less, I don't say; target passes in which function is added to the thread, so this location parameter is a function name; Name is the name of the thread, usually a string; args is the argument of the position parameter of the function just passed in. Note that it should be in the form of tuple; kwargs is the argument of the keyword parameter of the function just passed in; Daemon is to set whether the thread is a daemon thread. Write True or False, which is a Boolean value. We will talk about it in detail later.

Next, we will enrich the code above.

import threading
def fun(age,**kwargs):
    print('My name is',kwargs['name'],',this year',age,'Years old',sep='')
t1 = threading.Thread(target=fun,args=(18,),name = 't1',kwargs={'name':'Zhang San'})
t1.start()
print(t1)

Output:

My name is Zhang San,I'm 18 years old
<Thread(t1, stopped 14028)>

You can see that the name of the thread has been successfully changed to t1, and the state of the thread has stopped.

So what is the purpose of learning threads? In order to realize the concurrent operation of threads.

Example:

import threading
import time
def fun():
    print('hello')
    time.sleep(2)
    print('world')
def fun1():
    print('how are you?')
    time.sleep(2)
    print('fine')
t1 = threading.Thread(target=fun,name = 't1')
t2 = threading.Thread(target=fun1,name = 't2')
t1.start()
t2.start()

Output:

hello
how are you?
world
fine

OK, it seems that the result is really what we need. It is carried out alternately. By observing the output, we can see that the execution of the whole program takes 2s.

Let's use the program to calculate the time:

Example:

import threading
import time
def fun():
    print('hello')
    time.sleep(2)
    print('world')
def fun1():
    print('how are you?')
    time.sleep(2)
    print('fine')
start_time = time.time()
t1 = threading.Thread(target=fun,name = 't1')
t2 = threading.Thread(target=fun1,name = 't2')
t1.start()
t2.start()
end_time = time.time()
print(end_time-start_time)

Output:

hello
how are you?
0.001026153564453125
world
fine

Alas, we found that the output is not what we want. First, the position of the output time is wrong, and the output time is also wrong. Obviously, the code that outputs the time is executed at the beginning. Why?

Let's add a line of code and print it: threading.enumerate(). The function of this enumerate method is to return thread information, that is, how many threads there are, thread name and status.

import threading
import time
def fun():
    print('hello')
    time.sleep(2)
    print('world')
def fun1():
    print('how are you?')
    time.sleep(2)
    print('fine')
#start_time = time.time()
t1 = threading.Thread(target=fun,name = 't1')
t2 = threading.Thread(target=fun1,name = 't2')
t1.start()
t2.start()
# end_time = time.time()
# print(end_time-start_time)
print(threading.enumerate())

Output:

hello
how are you?
[<_MainThread(MainThread, started 2404)>, <Thread(t1, started 4660)>, <Thread(t2, started 2952)>]
fine
world

We found that there are actually three threads in the output content. In addition to t1 and t2, there is also a main thread. In the above code, the main thread, t1 and t2 are executed at the same time, that is, it is not what we think. After the cross execution of t1 and t2, the calculation output of time will be executed. How can we solve this problem? That is, after t1 and t2 are executed, execute the main thread? Just add two lines of code, jion().

Example:

import threading
import time
def fun():
    print('hello')
    time.sleep(2)
    print('world')
def fun1():
    print('how are you?')
    time.sleep(2)
    print('fine')
start_time = time.time()
t1 = threading.Thread(target=fun,name = 't1')
t2 = threading.Thread(target=fun1,name = 't2')
t1.start()
t2.start()
t1.join()
t2.join()
end_time = time.time()
print(end_time-start_time)

Output:

hello
how are you?
fine
world
2.0018341541290283

Found no problem. So what is the role of Thread.jion() here? Simply put, it is to wait for the child thread to finish before executing the main thread. To be exact, when the main thread encounters Thread.jion(), it will block until the sub process is executed, and then wake up the main thread, and the main thread will continue to execute. In addition, there is another parameter in Thread.jion(), which is timeout. In short, the main thread blocks so much time at most. If the sub thread has not ended, it will not wait and continue to execute the main thread.


Output:

hello
how are you?
0.716256856918335
worldfine

====================================================================================================================================================================================================

So the next topic, can multithreading really save time?

Let's look at two examples. One is two CPU intensive threads, and the other is two IO based threads.

CPU intensive:

import threading
import time
def fun():
    for i in range(10000000):
        sum = i*i*i*i
def fun1():
    for i in range(10000000):
        sum = i*i*i
start_time = time.time()
t1 = threading.Thread(target=fun,name = 't1')
t2 = threading.Thread(target=fun1,name = 't2')
t1.start()
t2.start()
t1.join()
t2.join()
end_time = time.time()
print('time',end_time-start_time)

Output:

time 2.053481340408325

 

import time
def fun():
    for i in range(10000000):
        sum = i*i*i*i
def fun1():
    for i in range(10000000):
        sum = i*i*i
start_time = time.time()
fun()
fun1()
end_time = time.time()
print('time',end_time-start_time)

Output:

time 2.016674518585205

We can see that using multithreading and not applicable multithreading take about the same time, and multithreading even loses a little more time, because thread switching also has time overhead.

Mainly IO type:

import threading
import time
def fun():
    time.sleep(1)
def fun1():
    time.sleep(1)
start_time = time.time()
t1 = threading.Thread(target=fun,name = 't1')
t2 = threading.Thread(target=fun1,name = 't2')
t1.start()
t2.start()
t1.join()
t2.join()
end_time = time.time()
print('time',end_time-start_time)

Output:

time 1.0026021003723145

 

import time
def fun():
    time.sleep(1)
def fun1():
    time.sleep(1)
start_time = time.time()
fun()
fun1()
end_time = time.time()
print('time',end_time-start_time)

Output:

time 2.0009641647338867

We can see that multithreading is about twice as fast in the case of IO tasks. In the example, we use time.sleep() to simulate io. In real IO, the CPU does not need to handle IO work. In time.sleep(), the CPU does not need to work, so it is the same.

Then I have a conclusion: if the thread task is CPU intensive, the computing time will not be reduced, but will increase slightly; If the thread task is Io based, the calculation time will be much less, or even half. Why? This is because there is a GIL lock in Python. In short, even if the computer has multiple CPUs, only one thread can work. So why is multithreading also used in crawlers to improve efficiency to a certain extent? Because the crawler itself is downloading data, that is, IO work.

====================================================================================================================================================================================================

Let's talk about the next topic. When we first introduced Theard method, we mentioned daemon threads. So what is daemon threads? We still take the example above as an example to modify it.

import threading
import time
def fun():
    print('hello')
    time.sleep(3)
    print('world')
def fun1():
    print('how are you?')
    time.sleep(3)
    print('fine')
start_time = time.time()
t1 = threading.Thread(target=fun,name = 't1',daemon=True)
t2 = threading.Thread(target=fun1,name = 't2',daemon=True)
t1.start()
t2.start()
t1.join(timeout=0.3)
t2.join(timeout=0.4)
end_time = time.time()
print(end_time-start_time)

Output:

hello
how are you?
0.7178466320037842

You can see that the last output of t1 and t2 is not executed, and the program ends. In the instantiation of t1 and t2, only the parameter setting is added: daemon=True. This is the daemon thread.

In conclusion, if daemon=False (the default), the process is a non daemon. On the contrary, if daemon=True, the process is a daemon. Guarding, as the name suggests, means guarding and living or dying together. The main thread ends and the child thread ends.

====================================================================================================================================================================================================

Let's talk about the next topic, how to ensure that the two sub processes execute alternately.

Example:

import threading
import time
# Define thread run functions
def ou():
    for i in range(0,10,2):
        print(i)
        time.sleep(0.5)
def ji():for i in range(1,10,2):
        print(i)
        time.sleep(0.5)
if __name__ == '__main__':
    th = threading.Thread(target=ji)
    th2 = threading.Thread(target=ou)
    th.start()
    th2.start()

Output:

0
1
23

5
4
76

89

We found that the output is irregular, but I just want the two processes to execute alternately, and the output parity crosses. What should I do?

You can refer to my previous case.