Concurrency of python usage object processing

Posted by DavidP123 on Wed, 05 Jan 2022 15:48:38 +0100

Article catalog

learning from fluent python

1. futures.ThreadPoolExecutor

import os
import time
import sys
import requests

POP20_CC = ('CN IN US ID BR PK NG BD RU JP ' 'MX PH VN ET EG DE IR TR CD FR').split()
BASE_URL = 'http://flupy.org/data/flags'
DEST_DIR = './'


def save_flag(img, filename):  # Save image
    path = os.path.join(DEST_DIR, filename)
    with open(path, 'wb') as fp:
        fp.write(img)


def get_flag(cc):  # Get image
    url = '{}/{cc}/{cc}.gif'.format(BASE_URL, cc=cc.lower())
    resp = requests.get(url)
    return resp.content


def show(text):  # Print information
    print(text, end=' ')
    sys.stdout.flush()


def download_many(cc_list):
    for cc in sorted(cc_list):
        image = get_flag(cc)  # obtain
        show(cc)  # Print
        save_flag(image, cc.lower() + '.gif')  # preservation
    return len(cc_list)


def main(download_many):
    t0 = time.time()
    count = download_many(POP20_CC)
    elapsed = time.time() - t0
    msg = '\n{} flags downloaded in {:.2f}s'
    print(msg.format(count, elapsed))  # Timing Information 


# ----Use futures The ThreadPoolExecutor class implements multi-threaded download
from concurrent import futures

MAX_WORKERS = 20  # How many threads are used at most


def download_one(cc):
    image = get_flag(cc)
    show(cc)
    save_flag(image, cc.lower() + '.gif')
    return cc


def download_many_1(cc_list):
    workers = min(MAX_WORKERS, len(cc_list))
    with futures.ThreadPoolExecutor(workers) as executor:
        #  Instantiate the ThreadPoolExecutor class with the number of working threads;
        #  executor.__exit__  Method calls the executor Shutdown (wait = true) method,
        #  It blocks threads before all threads have finished executing
        res = executor.map(download_one, sorted(cc_list))
        # download_ The one function will be called concurrently in multiple threads;
        # The map method returns a generator, so you can iterate to get the value returned by each function
    return len(list(res))


if __name__ == '__main__':
    # main(download_many) # 24 seconds
    main(download_many_1)  # 3 seconds

2. Future events

  • Generally, you should not create your own futures
  • Can only be instantiated by concurrent frameworks (concurrent.futures or asyncio) Reason: the expected event means that something will happen eventually, and its execution time has been scheduled. Therefore, it is only scheduled to leave something to concurrent futures. Concurrent. Is only created when the executor subclass is processed futures. Future instance For example, executor The parameter of the submit () method is a callable object. After calling this method, it will schedule the incoming callable object and return a period object
def download_many_2(cc_list):
    cc_list = cc_list[:5]
    with futures.ThreadPoolExecutor(max_workers=3) as executor:
        to_do = []
        for cc in sorted(cc_list):
            future = executor.submit(download_one, cc)
            # executor. The submit method schedules the execution time of the callable object,
            # Then a period object is returned, indicating the operation to be performed
            to_do.append(future) # Store each phase
            msg = 'Scheduled for {}: {}'
            print(msg.format(cc, future))
        results = []
        for future in futures.as_completed(to_do):
            # as_ The completed function outputs the period after the period is run
            res = future.result() # Obtain the results of the period
            msg = '{} result: {!r}'
            print(msg.format(future, res))
            results.append(res)
    return len(results)
Output:
Scheduled for BR: <Future at 0x22da99d2d30 state=running>
Scheduled for CN: <Future at 0x22da99e1040 state=running>
Scheduled for ID: <Future at 0x22da99e1b20 state=running>
Scheduled for IN: <Future at 0x22da99ec520 state=pending>
Scheduled for US: <Future at 0x22da99ecd00 state=pending>
CN <Future at 0x22da99e1040 state=finished returned str> result: 'CN'
BR <Future at 0x22da99d2d30 state=finished returned str> result: 'BR'
ID <Future at 0x22da99e1b20 state=finished returned str> result: 'ID'
IN <Future at 0x22da99ec520 state=finished returned str> result: 'IN'
US <Future at 0x22da99ecd00 state=finished returned str> result: 'US'

5 flags downloaded in 3.20s

3. Blocking I/O and GIL

The CPython interpreter itself is not thread safe, so it has a global interpreter lock (GIL), which allows only one thread to execute Python bytecode at a time. Therefore, a python process usually cannot use multiple CPU cores at the same time

All functions in the standard library that perform blocking I/O operations will release GIL when waiting for the operating system to return results. This means that multithreading can be used at the python language level, and I/O-Intensive Python programs can benefit from it: when a python thread waits for a network response, the blocking I/O function will release the GIL and run another thread (network download, file reading and writing are IO intensive)

4. Use concurrent The futures module starts the process

This module implements true parallel computing because it uses the ProcessPoolExecutor class to distribute work to multiple Python processes. Therefore, if CPU intensive processing is required, using this module can bypass GIL and use all available CPU cores

Click to view: conceptual differences between process and thread

Use concurrent The futures module makes it particularly easy to turn a thread based scheme into a process based scheme

The value of ProcessPoolExecutor is embodied in CPU intensive jobs