Article catalog
learning from fluent python
1. futures.ThreadPoolExecutor
import os import time import sys import requests POP20_CC = ('CN IN US ID BR PK NG BD RU JP ' 'MX PH VN ET EG DE IR TR CD FR').split() BASE_URL = 'http://flupy.org/data/flags' DEST_DIR = './' def save_flag(img, filename): # Save image path = os.path.join(DEST_DIR, filename) with open(path, 'wb') as fp: fp.write(img) def get_flag(cc): # Get image url = '{}/{cc}/{cc}.gif'.format(BASE_URL, cc=cc.lower()) resp = requests.get(url) return resp.content def show(text): # Print information print(text, end=' ') sys.stdout.flush() def download_many(cc_list): for cc in sorted(cc_list): image = get_flag(cc) # obtain show(cc) # Print save_flag(image, cc.lower() + '.gif') # preservation return len(cc_list) def main(download_many): t0 = time.time() count = download_many(POP20_CC) elapsed = time.time() - t0 msg = '\n{} flags downloaded in {:.2f}s' print(msg.format(count, elapsed)) # Timing Information # ----Use futures The ThreadPoolExecutor class implements multi-threaded download from concurrent import futures MAX_WORKERS = 20 # How many threads are used at most def download_one(cc): image = get_flag(cc) show(cc) save_flag(image, cc.lower() + '.gif') return cc def download_many_1(cc_list): workers = min(MAX_WORKERS, len(cc_list)) with futures.ThreadPoolExecutor(workers) as executor: # Instantiate the ThreadPoolExecutor class with the number of working threads; # executor.__exit__ Method calls the executor Shutdown (wait = true) method, # It blocks threads before all threads have finished executing res = executor.map(download_one, sorted(cc_list)) # download_ The one function will be called concurrently in multiple threads; # The map method returns a generator, so you can iterate to get the value returned by each function return len(list(res)) if __name__ == '__main__': # main(download_many) # 24 seconds main(download_many_1) # 3 seconds
2. Future events
- Generally, you should not create your own futures
- Can only be instantiated by concurrent frameworks (concurrent.futures or asyncio) Reason: the expected event means that something will happen eventually, and its execution time has been scheduled. Therefore, it is only scheduled to leave something to concurrent futures. Concurrent. Is only created when the executor subclass is processed futures. Future instance For example, executor The parameter of the submit () method is a callable object. After calling this method, it will schedule the incoming callable object and return a period object
def download_many_2(cc_list): cc_list = cc_list[:5] with futures.ThreadPoolExecutor(max_workers=3) as executor: to_do = [] for cc in sorted(cc_list): future = executor.submit(download_one, cc) # executor. The submit method schedules the execution time of the callable object, # Then a period object is returned, indicating the operation to be performed to_do.append(future) # Store each phase msg = 'Scheduled for {}: {}' print(msg.format(cc, future)) results = [] for future in futures.as_completed(to_do): # as_ The completed function outputs the period after the period is run res = future.result() # Obtain the results of the period msg = '{} result: {!r}' print(msg.format(future, res)) results.append(res) return len(results)
Output: Scheduled for BR: <Future at 0x22da99d2d30 state=running> Scheduled for CN: <Future at 0x22da99e1040 state=running> Scheduled for ID: <Future at 0x22da99e1b20 state=running> Scheduled for IN: <Future at 0x22da99ec520 state=pending> Scheduled for US: <Future at 0x22da99ecd00 state=pending> CN <Future at 0x22da99e1040 state=finished returned str> result: 'CN' BR <Future at 0x22da99d2d30 state=finished returned str> result: 'BR' ID <Future at 0x22da99e1b20 state=finished returned str> result: 'ID' IN <Future at 0x22da99ec520 state=finished returned str> result: 'IN' US <Future at 0x22da99ecd00 state=finished returned str> result: 'US' 5 flags downloaded in 3.20s
3. Blocking I/O and GIL
The CPython interpreter itself is not thread safe, so it has a global interpreter lock (GIL), which allows only one thread to execute Python bytecode at a time. Therefore, a python process usually cannot use multiple CPU cores at the same time
All functions in the standard library that perform blocking I/O operations will release GIL when waiting for the operating system to return results. This means that multithreading can be used at the python language level, and I/O-Intensive Python programs can benefit from it: when a python thread waits for a network response, the blocking I/O function will release the GIL and run another thread (network download, file reading and writing are IO intensive)
4. Use concurrent The futures module starts the process
This module implements true parallel computing because it uses the ProcessPoolExecutor class to distribute work to multiple Python processes. Therefore, if CPU intensive processing is required, using this module can bypass GIL and use all available CPU cores
Click to view: conceptual differences between process and thread
Use concurrent The futures module makes it particularly easy to turn a thread based scheme into a process based scheme
The value of ProcessPoolExecutor is embodied in CPU intensive jobs