When you first learned Python multithreading, searching for data on the web almost completely reflected that Python did not have a true sense of multithreading, Python multithreading is chicken ribs.It was not clear at that time, but the concept of python with a GIL interpreter lock was understood. Only one thread could be running at a time, and the switch would be released if IO operations were encountered.So is Python multithreading really chicken ribs?To solve this confusion, I think you have to test it yourself.
After comparing python with java's multi-threaded tests, I find that python's multi-threaded efficiency is really not as efficient as java's, but it's not as good as the chicken ribs, so what about comparing it to other mechanisms?
Viewpoint: Replace multithreaded requirements with multiprocesses
After reading many blog posts, I see some netizens'opinions that python multi-process should be used instead of multi-threading, because multi-process is not restricted by GIL.So I started using multi-process to solve some concurrency problems, and encountered some pits during the process. Fortunately, most of the search data solved, and then made a brief summary of the multi-process. Python multiprocess.
So can multiple processes completely replace multithreading?Don't worry. Let's keep looking down.
Viewpoint: Partnership is the best solution
The concept of a protocol is currently hot. What makes a protocol different from a thread is that it is not switched by the operating system, but by programmer code. That is, the switching is controlled by the programmer, so there are no so-called security issues for threads.The concepts of the protocol are very broad and deep. This article will not introduce them at this time, but will be written separately in the future.
test data
Well, the online view is to use multiprocesses or collaborations instead of multithreads (except for programming languages and interpreters, of course), so let's test the performance of these three.Since fair testing should take into account both IO-intensive and CPU-intensive issues, two sets of data are tested.
IO Intensive Testing
To test IO intensive, I select the most common crawl functionality and calculate how long it takes the crawler to access the bing.(Testing multithreads and protocols, single-threaded and multiprocess is unnecessary)
Test code:
Python
#! -*- coding:utf-8 -*- from gevent import monkey;monkey.patch_all() import gevent import time import threading import urllib2 def urllib2_(url): try: urllib2.urlopen(url,timeout=10).read() except Exception,e: print e def gevent_(urls): jobs=[gevent.spawn(urllib2_,url) for url in urls] gevent.joinall(jobs,timeout=10) for i in jobs: i.join() def thread_(urls): a=[] for url in urls: t=threading.Thread(target=urllib2_,args=(url,)) a.append(t) for i in a: i.start() for i in a: i.join() if __name__=="__main__": urls=["https://www.bing.com/"]*10 t1=time.time() gevent_(urls) t2=time.time() print 'gevent-time:%s' % str(t2-t1) thread_(urls) t4=time.time() print 'thread-time:%s' % str(t4-t2)
CPU intensive test results:
Visit 10 times gevent-time:0.380326032639 thread-time:0.376606941223 //50 visits gevent-time:1.3358900547 thread-time:1.59564089775 //100 visits gevent-time:2.42984986305 thread-time:2.5669670105 //300 visits gevent-time:6.66330099106 thread-time:10.7605059147
It can be seen from the results that when the number of concurrencies increases, the efficiency of the process is indeed higher than that of multithreading, but when the number of concurrencies is not that high, the difference is not significant.
CPU intensive, I choose some functions of scientific computing to calculate the time required.(Main test single-threaded, multi-threaded, protocol, multi-process)
Test code:
Python
#! -*- coding:utf-8 -*- from multiprocessing import Process as pro from multiprocessing.dummy import Process as thr from gevent import monkey;monkey.patch_all() import gevent def run(i): lists=range(i) list(set(lists)) if __name__=="__main__": ''' //Multiprocess ''' for i in range(30): ##10-2.1s 20-3.8s 30-5.9s t=pro(target=run,args=(5000000,)) t.start() ''' //Multithreaded ''' # for i in range(30): ##10-3.8s 20-7.6s 30-11.4s # t=thr(target=run,args=(5000000,)) # t.start() ''' //Protocol ''' # jobs=[gevent.spawn(run,5000000) for i in range(30)] ##10-4.0s 20-7.7s 30-11.5s # gevent.joinall(jobs) # for i in jobs: # i.join() ''' //Single Thread ''' # for i in range(30): ##10-3.5s 20-7.6s 30-11.3s # run(5000000)
Concurrent 10 times: [Multi-process] 2.1s [Multi-threaded] 3.8s [Coprocess] 4.0s [Single-threaded] 3.5s test results:
- 20 concurrencies: [multi-process] 3.8s [multi-threaded] 7.6s [protocol] 7.7s [single-threaded] 7.6s
- 30 concurrencies: [multi-process] 5.9s [multi-threaded] 11.4s [protocol] 11.5s [single-threaded] 11.3s
You can see that under the CPU-intensive test, the multi-process effect is significantly better than the others, and the multi-threaded, collaborative and single-threaded effects are similar.This is because only multiprocesses use the full computing power of the CPU.As we can also see when code runs, only multiprocesses can fill CPU usage.
Conclusion of this article
From the two sets of data, it's not difficult to see that Python multithreading is not that chicken ribs.If not, why doesn't Python 3 remove GIL s?The Python community also has two schools of opinion on this issue, which are not addressed here. We should respect Python's father's decision.
As for when to use multithreading, when to use multiprocesses, and when to use protocols?The answer must already be obvious.
When we need to write IO-intensive programs such as concurrent crawlers, we should use multi-threading or collaboration (not particularly pronounced in the pro-test gap); when we need scientific computing, when designing CPU-intensive programs, we should use multi-process.Of course, the premise of the above conclusion is that it is not distributed but tested on one server.
The answer has already been given. Does this end here?Now that you've discussed the usefulness of Python multithreading, let's introduce its use.
Multiprocessing.dummy module
The Multiprocessing.dummy usage is similar to the multiprocess Multiprocessing usage except when import ing a package, add.dummy.
Usage Reference Multiprocessing usage
threading module
This is the threading multithreading module that comes with python. There are two main ways to create multithreads.One inherits the threading class and the other uses the threading.Thread function, which are described next.
Usage[1]
Create threads using the threading.Thread() function.
Code:
Python
def run(i): print i for i in range(10): t=threading.Thread(target=run,args=(i,)) t.start()
Thread object method: Description: Thread() function has two parameters, one is target, content is the function name to be executed by the child thread; the other is args, content is the parameter to be passed.When the child thread is created, an object is returned, and the object's start method is called to start the child thread.
- Start() Starts thread execution
- Run() Functions that define the functionality of threads
- The Join(timeout=None) program hangs until the end of the thread; if timeout is given, timeout is blocked for up to seconds
- getName() returns the name of the thread
- setName() Sets the name of the thread
- isAlive() Boolean flag indicating whether the thread is still running
- isDaemon() returns the daemon flag of the thread
- setDaemon(daemonic) sets the daemon flag of the thread to daemonic (must be called before the start() function)
- t.setDaemon(True) sets the parent thread as a daemon thread and the child process ends when the parent process ends.
Methods of the threading class:
- Number of threading.enumerate() running threads
Usage[2]
Create threads by inheriting the threading class.
Code:
Python
import threading class test(threading.Thread): def __init__(self): threading.Thread.__init__(self) def run(self): try: print "code one" except: pass for i in range(10): cur=test() cur.start() for i in range(10): cur.join()
Get Thread Return Value Problem Note: This method inherits the threading class and reconstructs the run function.
Sometimes we need to get the return value for each child thread.However, by calling a common function, the way to get the return value is not applicable in multiple threads.A new way is needed to get the return value of the child thread.
Code:
Python
import threading class test(threading.Thread): def __init__(self): threading.Thread.__init__(self) def run(self): self.tag=1 def get_result(self): if self.tag==1: return True else: return False f=test() f.start() while f.isAlive(): continue print f.get_result()
Description: The first question that multithreads ask for return values is when does the child thread end?When should we get the return value?The isAlive() method can be used to determine whether a child thread is alive or not.
Controlling the number of threads running
When there are many tasks to perform, we often need to control the number of threads. The threading class comes with a way to control the number of threads.
Code:
Python
import threading maxs=10 ##Number of concurrent threads threadLimiter=threading.BoundedSemaphore(maxs) class test(threading.Thread): def __init__(self): threading.Thread.__init__(self) def run(self): threadLimiter.acquire() #Obtain try: print "code one" except: pass finally: threadLimiter.release() #release for i in range(100): cur=test() cur.start() for i in range(100): cur.join()
Description: The above programs can control the concurrency number of multithreads to 10, more than this number will cause exceptions.
In addition to our own methods, we can design other solutions:
Python
threads=[] ''' //Create all threads ''' for i in range(10): t=threading.Thread(target=run,args=(i,)) threads.append(t) ''' //Start threads in the list ''' for t in threads: t.start() while True: #Determine the number of threads running and exit the while loop if less than 5. #Enter the for loop to start a new process. Otherwise, you will continue to enter the dead loop in the while loop if(len(threading.enumerate())<5): break
Thread pooling works in either way, and I prefer the following.
Python
import threadpool def ThreadFun(arg1,arg2): pass def main(): device_list=[object1,object2,object3......,objectn]#Number of devices to process task_pool=threadpool.ThreadPool(8)#8 is the number of threads in the thread pool request_list=[]#Store Task List #First construct the task list for device in device_list: request_list.append(threadpool.makeRequests(ThreadFun,[((device, ), {})])) #Put each task in the thread pool, wait for the threads in the thread pool to read the task individually, and then process it. Use the map function, you can learn more about it. map(task_pool.putRequest,request_list) #Wait for all tasks to complete, return, and block if not task_pool.poll() if __name__=="__main__": main()
Multi-process problem, can rush Python multiprocess On-site, other multi-threaded issues can be discussed in the comments below
theory