Python multi-threaded chicken year without ribs

Posted by seangamer on Wed, 26 Jun 2019 18:09:13 +0200

When you first learned Python multithreading, searching for data on the web almost completely reflected that Python did not have a true sense of multithreading, Python multithreading is chicken ribs.It was not clear at that time, but the concept of python with a GIL interpreter lock was understood. Only one thread could be running at a time, and the switch would be released if IO operations were encountered.So is Python multithreading really chicken ribs?To solve this confusion, I think you have to test it yourself.

After comparing python with java's multi-threaded tests, I find that python's multi-threaded efficiency is really not as efficient as java's, but it's not as good as the chicken ribs, so what about comparing it to other mechanisms?

Viewpoint: Replace multithreaded requirements with multiprocesses

After reading many blog posts, I see some netizens'opinions that python multi-process should be used instead of multi-threading, because multi-process is not restricted by GIL.So I started using multi-process to solve some concurrency problems, and encountered some pits during the process. Fortunately, most of the search data solved, and then made a brief summary of the multi-process. Python multiprocess.
So can multiple processes completely replace multithreading?Don't worry. Let's keep looking down.

Viewpoint: Partnership is the best solution

The concept of a protocol is currently hot. What makes a protocol different from a thread is that it is not switched by the operating system, but by programmer code. That is, the switching is controlled by the programmer, so there are no so-called security issues for threads.The concepts of the protocol are very broad and deep. This article will not introduce them at this time, but will be written separately in the future.

test data

Well, the online view is to use multiprocesses or collaborations instead of multithreads (except for programming languages and interpreters, of course), so let's test the performance of these three.Since fair testing should take into account both IO-intensive and CPU-intensive issues, two sets of data are tested.

IO Intensive Testing

To test IO intensive, I select the most common crawl functionality and calculate how long it takes the crawler to access the bing.(Testing multithreads and protocols, single-threaded and multiprocess is unnecessary)
Test code:

Python

#! -*- coding:utf-8 -*-

from gevent import monkey;monkey.patch_all()

import gevent

import time

import threading

import urllib2

def urllib2_(url):

try:

urllib2.urlopen(url,timeout=10).read()

except Exception,e:

print e

def gevent_(urls):

jobs=[gevent.spawn(urllib2_,url) for url in urls]

gevent.joinall(jobs,timeout=10)

for i in jobs:

i.join()

def thread_(urls):

a=[]

for url in urls:

t=threading.Thread(target=urllib2_,args=(url,))

a.append(t)

for i in a:

i.start()

for i in a:

i.join()

if __name__=="__main__":

urls=["https://www.bing.com/"]*10      

t1=time.time()

gevent_(urls)

t2=time.time()

print 'gevent-time:%s' % str(t2-t1)

thread_(urls)

t4=time.time()

print 'thread-time:%s' % str(t4-t2)

CPU intensive test results:

Visit 10 times
gevent-time:0.380326032639
thread-time:0.376606941223
//50 visits
gevent-time:1.3358900547
thread-time:1.59564089775
//100 visits
gevent-time:2.42984986305
thread-time:2.5669670105
//300 visits
gevent-time:6.66330099106
thread-time:10.7605059147

It can be seen from the results that when the number of concurrencies increases, the efficiency of the process is indeed higher than that of multithreading, but when the number of concurrencies is not that high, the difference is not significant.

CPU intensive, I choose some functions of scientific computing to calculate the time required.(Main test single-threaded, multi-threaded, protocol, multi-process)
Test code:

Python

#! -*- coding:utf-8 -*-

from multiprocessing import Process as pro

from multiprocessing.dummy import Process as thr

from gevent import monkey;monkey.patch_all()

import gevent

def run(i):

lists=range(i)

list(set(lists))

if __name__=="__main__":

'''

//Multiprocess

'''

for i in range(30):      ##10-2.1s 20-3.8s 30-5.9s

t=pro(target=run,args=(5000000,))

t.start()

'''

//Multithreaded

'''

# for i in range(30):    ##10-3.8s  20-7.6s  30-11.4s

# t=thr(target=run,args=(5000000,))

# t.start()

'''

//Protocol

'''

# jobs=[gevent.spawn(run,5000000) for i in range(30)]  ##10-4.0s 20-7.7s 30-11.5s

# gevent.joinall(jobs)

# for i in jobs:

# i.join()

'''

//Single Thread

'''

# for i in range(30):  ##10-3.5s  20-7.6s 30-11.3s

# run(5000000)

Concurrent 10 times: [Multi-process] 2.1s [Multi-threaded] 3.8s [Coprocess] 4.0s [Single-threaded] 3.5s test results:

20 concurrencies: [multi-process] 3.8s [multi-threaded] 7.6s [protocol] 7.7s [single-threaded] 7.6s
30 concurrencies: [multi-process] 5.9s [multi-threaded] 11.4s [protocol] 11.5s [single-threaded] 11.3s

You can see that under the CPU-intensive test, the multi-process effect is significantly better than the others, and the multi-threaded, collaborative and single-threaded effects are similar.This is because only multiprocesses use the full computing power of the CPU.As we can also see when code runs, only multiprocesses can fill CPU usage.

Conclusion of this article

From the two sets of data, it's not difficult to see that Python multithreading is not that chicken ribs.If not, why doesn't Python 3 remove GIL s?The Python community also has two schools of opinion on this issue, which are not addressed here. We should respect Python's father's decision.
As for when to use multithreading, when to use multiprocesses, and when to use protocols?The answer must already be obvious.
When we need to write IO-intensive programs such as concurrent crawlers, we should use multi-threading or collaboration (not particularly pronounced in the pro-test gap); when we need scientific computing, when designing CPU-intensive programs, we should use multi-process.Of course, the premise of the above conclusion is that it is not distributed but tested on one server.
The answer has already been given. Does this end here?Now that you've discussed the usefulness of Python multithreading, let's introduce its use.

Multiprocessing.dummy module

The Multiprocessing.dummy usage is similar to the multiprocess Multiprocessing usage except when import ing a package, add.dummy.
Usage Reference Multiprocessing usage

threading module

This is the threading multithreading module that comes with python. There are two main ways to create multithreads.One inherits the threading class and the other uses the threading.Thread function, which are described next.

Usage[1]

Create threads using the threading.Thread() function.
Code:

Python

def run(i):

print i

for i in range(10):

t=threading.Thread(target=run,args=(i,))

t.start()

Thread object method: Description: Thread() function has two parameters, one is target, content is the function name to be executed by the child thread; the other is args, content is the parameter to be passed.When the child thread is created, an object is returned, and the object's start method is called to start the child thread.

Start() Starts thread execution
Run() Functions that define the functionality of threads
The Join(timeout=None) program hangs until the end of the thread; if timeout is given, timeout is blocked for up to seconds
getName() returns the name of the thread
setName() Sets the name of the thread
isAlive() Boolean flag indicating whether the thread is still running
isDaemon() returns the daemon flag of the thread
setDaemon(daemonic) sets the daemon flag of the thread to daemonic (must be called before the start() function)
t.setDaemon(True) sets the parent thread as a daemon thread and the child process ends when the parent process ends.

Methods of the threading class:

Number of threading.enumerate() running threads

Usage[2]

Create threads by inheriting the threading class.
Code:

Python

import threading

class test(threading.Thread):

    def __init__(self):

        threading.Thread.__init__(self)

    def run(self):

        try:

            print "code one"

        except:

            pass

for i in range(10):

    cur=test()

    cur.start()

for i in range(10):

    cur.join()

Get Thread Return Value Problem Note: This method inherits the threading class and reconstructs the run function.

Sometimes we need to get the return value for each child thread.However, by calling a common function, the way to get the return value is not applicable in multiple threads.A new way is needed to get the return value of the child thread.
Code:

Python

import threading

class test(threading.Thread):

    def __init__(self):

        threading.Thread.__init__(self)

    def run(self):

        self.tag=1

    def get_result(self):

        if self.tag==1:

            return True

        else:

            return False

f=test()
f.start()
while f.isAlive():
    continue

print f.get_result()

Description: The first question that multithreads ask for return values is when does the child thread end?When should we get the return value?The isAlive() method can be used to determine whether a child thread is alive or not.

Controlling the number of threads running

When there are many tasks to perform, we often need to control the number of threads. The threading class comes with a way to control the number of threads.
Code:

Python

import threading

maxs=10  ##Number of concurrent threads

threadLimiter=threading.BoundedSemaphore(maxs)

class test(threading.Thread):

    def __init__(self):

        threading.Thread.__init__(self)

    def run(self):

        threadLimiter.acquire()  #Obtain

        try:

            print "code one"

        except:

            pass

        finally:

            threadLimiter.release() #release

for i in range(100):

    cur=test()

    cur.start()

for i in range(100):

    cur.join()

Description: The above programs can control the concurrency number of multithreads to 10, more than this number will cause exceptions.

In addition to our own methods, we can design other solutions:

Python

threads=[]

'''

//Create all threads

'''

for i in range(10):

t=threading.Thread(target=run,args=(i,))

threads.append(t)

'''

//Start threads in the list

'''

for t in threads:

    t.start()

    while True:

        #Determine the number of threads running and exit the while loop if less than 5.

        #Enter the for loop to start a new process. Otherwise, you will continue to enter the dead loop in the while loop

        if(len(threading.enumerate())<5):

            break

Thread pooling works in either way, and I prefer the following.

Python

import threadpool

def ThreadFun(arg1,arg2):

    pass

def main():

    device_list=[object1,object2,object3......,objectn]#Number of devices to process

    task_pool=threadpool.ThreadPool(8)#8 is the number of threads in the thread pool

    request_list=[]#Store Task List

    #First construct the task list

    for device in device_list:

        request_list.append(threadpool.makeRequests(ThreadFun,[((device, ), {})]))

    #Put each task in the thread pool, wait for the threads in the thread pool to read the task individually, and then process it. Use the map function, you can learn more about it.

    map(task_pool.putRequest,request_list)

    #Wait for all tasks to complete, return, and block if not

    task_pool.poll()

if __name__=="__main__":

    main()

Multi-process problem, can rush Python multiprocess On-site, other multi-threaded issues can be discussed in the comments below
theory

Topics: Python Java Programming less

Programmer Think

Python multi-threaded chicken year without ribs

Viewpoint: Replace multithreaded requirements with multiprocesses

Viewpoint: Partnership is the best solution

test data

IO Intensive Testing

Conclusion of this article

Multiprocessing.dummy module

threading module

Usage[1]

Usage[2]

Controlling the number of threads running

Hot Topics