Python advanced - coroutine

Posted by ady01 on Thu, 20 Jan 2022 09:11:01 +0100

catalogue

Synergetic process

Collaborative process is a multi-party collaborative way of work. The current executor actively gives up the control flow at a certain time and remembers its current state, so that it can resume execution from the last given position when the control flow returns.

In short, the core idea of collaborative process is the executor's "active surrender" and "recovery" of control flow. Compared with the preemptive scheduling of threads, collaborative scheduling is a cooperative scheduling method.

Application scenario of collaborative process

Disadvantages of preemptive scheduling

In I/O intensive scenarios, the solution of preemptive scheduling is the "asynchronous + callback" mechanism.

The problem is that in some scenarios, the readability of the whole program will be very poor. Take the picture download as an example. The picture service console provides an asynchronous interface and returns immediately after the initiator requests. The picture service gives the initiator a unique ID. after the picture service completes the download, the result is placed in a message queue. At this time, the initiator needs to continuously consume this MQ to get the result of whether the download is completed or not.

It can be seen that the overall logic is divided into several parts, and each sub part will have state migration, which must be a high incidence of bugs in the future.

Advantages of user mode collaborative scheduling

With the development of network technology and high concurrency requirements, the advantages of user state collaborative scheduling mechanism provided by collaborative process have been gradually mined in heavy I/O operation scenarios such as network operation, file operation, database operation and message queue operation.

The coroutine returns the processing power of I/O from the kernel operating system to the user program itself. When the user mode program executes I/O, it actively gives the execution right of the CPU to other processes through yield. Multiple processes are in an equal, symmetrical and cooperative relationship.

Characteristics of collaborative process

First of all, it should be noted that the collaboration process itself cannot use multi-core, and it needs to be used in conjunction with the process before it can play a role on the multi-core platform.

Coroutine in Python

Python's support for coroutines has experienced several versions:

  1. Python2.x has limited support for collaborative process. The generator supported by yield keyword realizes some functions of collaborative process, but it is not complete.
  2. The third-party library gevent has better support for collaborative processes.
  3. Python3. The asyncio module is provided in 4.
  4. Python3. The async/await keyword is introduced in 5.
  5. Python3. The asyncio module in 6 is more perfect and stable.
  6. Python3. async/await keyword is built in 7.

Sample program for async/await:

import asyncio
from pathlib import Path
import logging
from urllib.request import urlopen, Request
import os
from time import time
import aiohttp
 

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)


CODEFLEX_IMAGES_URLS = ['https://codeflex.co/wp-content/uploads/2021/01/pandas-dataframe-python-1024x512.png',
                        'https://codeflex.co/wp-content/uploads/2021/02/github-actions-deployment-to-eks-with-kustomize-1024x536.jpg',
                        'https://codeflex.co/wp-content/uploads/2021/02/boto3-s3-multipart-upload-1024x536.jpg',
                        'https://codeflex.co/wp-content/uploads/2018/02/kafka-cluster-architecture.jpg',
                        'https://codeflex.co/wp-content/uploads/2016/09/redis-cluster-topology.png']
 
 
async def download_image_async(session, dir, img_url):
    download_path = dir / os.path.basename(img_url)
    async with session.get(img_url) as response:
        with download_path.open('wb') as f:
            while True:
                # Use the await keyword in the async function to wait for task execution to complete, that is, wait for yeild to give up control.
                # Meanwhile, asyncio uses the event loop event_loop to implement the whole process.
                chunk = await response.content.read(512)
                if not chunk:
                    break
                f.write(chunk)
    logger.info('Downloaded: ' + img_url)
 
 
# Declare an asynchronous / CO procedural function using the async keyword.
# When this function is called, it does not run immediately, but returns a coroutine object, which is later displayed in event_loop.
async def main():
    images_dir = Path("codeflex_images")
    Path("codeflex_images").mkdir(parents=False, exist_ok=True)
 
    async with aiohttp.ClientSession() as session:
        tasks = [(download_image_async(session, images_dir, img_url)) for img_url in CODEFLEX_IMAGES_URLS]
        await asyncio.gather(*tasks, return_exceptions=True)
 
 
if __name__ == '__main__':
    start = time()
    
    # event_ The loop event loop acts as a manager, switching control between several coprocessor functions.
    event_loop = asyncio.get_event_loop()
    try:
        event_loop.run_until_complete(main())
    finally:
        event_loop.close()
 
    logger.info('Download time: %s seconds', time() - start)

Reference documents

https://mp.weixin.qq.com/s/LItTjy2uN6iJvN2MqNPQ7Q

Topics: Python Back-end