[Python basics] Python collaboration

Posted by whoisKeel on Wed, 15 Dec 2021 09:54:17 +0100

Conceptually, we all know that multiprocesses and multithreads, while coprocesses actually realize multiple concurrency in a single thread. Syntactically, coprocedures are similar to generators. They are functions that contain the yield keyword in the definition body. The difference is that the yield of the co process usually appears on the right side of the expression: datum = yield. This makes beginners instantly feel that the yield keyword is not fragrant. Originally, they thought that yield was a simple suspension of execution and returned a value. Can the result be placed on the right?

From generator to coroutine
Let's take a look at what may be the simplest use example of a collaborative process:

>>> def simple_coroutine():
...     print("-> coroutine started")
...     x = yield
...     print("-> coroutine received:", x)
...     
>>> my_coro = simple_coroutine()
>>> my_coro
<generator object simple_coroutine at 0x0000019A681F27B0>
>>> next(my_coro)
-> coroutine started
>>> my_coro.send(42)
-> coroutine received: 42
Traceback (most recent call last):
  File "<input>", line 1, in <module>
StopIteration

The reason why yield can be placed on the right is that the coroutine can be used by the caller The value pushed by send().

After the yield is placed on the right, another expression can be placed on the right. Please see the following example:

def simple_coro2(a):
    b = yield a
    c = yield a + b

my_coro2 = simple_coro2(14)
next(my_coro2)
my_coro2.send(28)
my_coro2.send(99)

The execution process is:

Call next(my_coro2), execute yield a, and output 14.
Call my_coro2.send(28), assign 28 to b, then execute yield a + b, and output 42.
Call my_coro2.send(99), assign 99 to c, and the coroutine terminates.

It is concluded that for the line b = yield a, the code on the right of = is executed before the assignment.

In the example, you need to call next(my_coro) to start the generator, let the program pause at the yield statement, and then you can send data. This is because there are four states in a collaborative process:

'GEN_CREATED 'wait for execution to begin
'GEN_RUNNING 'interpreter is executing
'GEN_SUSPENDED 'pauses at yield expression
'GEN_ 'closed' end of execution

Only in gen_ Data can only be sent in the suspended state. This step done in advance is called pre excitation. You can call # next(my_coro) pre excitation or # my_coro.send(None) pre excitation has the same effect.

Pre excitation process
The coroutine can only be used after pre excitation, that is, call next before send to make the coroutine in GEN_SUSPENDED status. But this thing is often forgotten. To avoid forgetting, you can define a pre excitation decorator, such as:

from functools import wraps

def coroutine(func):
    @wraps(func)
    def primer(*args, **kwargs):
        gen = func(*args, **kwargs)
        next(gen)
        return gen
    return primer

But in fact, Python gives a more elegant way, called yield from, which automatically pre excites the co process.

The custom pre excitation decorator and yield from are incompatible.

yield from
Yield from is equivalent to await keyword in other languages. Its function is: when yield from subgen() is used in generator gen, subgen will gain control and pass the output value to the caller of gen, that is, the caller can directly control subgen. At the same time, Gen will block and wait for subgen to terminate.

yield from can be used to simplify the yield in the for loop:

for c in "AB":
    yield c
yield from "AB"

The first thing the yield from x expression does for X is to call iter(x) to get the iterator from it.

But the role of yield from is far more than that. Its more important role is to open two-way channels. As shown in the figure below:


This picture is very informative and difficult to understand.

First, understand the three concepts: caller, delegate generator and sub generator.

Caller
To put it bluntly, it is the main function, which is the well-known program entry main function.

​# the client code, a.k.a. the caller def main(data): # <8> results = {} for key, values in data.items(): group = grouper(results, key) # <9> next(group) # <10> for value in values: group.send(value) # <11> group.send(None) # important! <12> # print(results) # uncomment to debug report(results)​

Delegate generator
Is a function that contains a yield from statement, that is, a coroutine.

​# the delegating generator def grouper(results, key): # <5> while True: # <6> results[key] = yield from averager() # <7>​

Sub generator
It is the subprocess followed on the right of the yield from statement.

​​# the subgenerator def averager(): # <1> total = 0.0 count = 0 average = None while True: term = yield # <2> if term is None: # <3> break total += term count += 1 average = total/count return Result(count, average) # <4>​ ​

It's much more comfortable than the term looks.

Then there are five lines: send, yield, throw, StopIteration and close.

send
When a coroutine pauses at the yield from expression, the main function can send data to the child coroutine following the right of the yield from statement through the yield from expression.
yield
The subprocess followed on the right of the yield from statement sends the output value to the main function through the yield from expression.
throw
The main function passes through group Send (None). Pass in a value of None to terminate the while loop of the child coroutine followed on the right of the yield from statement, so that the control right can be handed back to the coroutine and execution can continue. Otherwise, it will be suspended in the yield from statement all the time.
StopIteration
After the generator function followed on the right of the yield from statement returns, the interpreter will throw a StopIteration exception. And attach the return value to the exception object, and the coroutine will recover.
close
After the main function is executed, it will call the close() method to exit the coroutine.

Once the general process is clear, we won't continue to study more technical details. If you have time, learn more in the Python principle Series in the future.

yield from is often associated with Python 3 4 @ asyncio. In the standard library Coroutine is used in combination with decorator.

The coroutine is used as an accumulator
This is a common use of collaborative process. The code is as follows:

def averager():
    total = 0.0
    count = 0
    average = None
    while True:  # <1>
        term = yield average  # <2>
        total += term
        count += 1
        average = total/count

Concurrency is realized by coroutine
The example here is a little complicated

The core code snippets are:

# BEGIN TAXI_PROCESS
def taxi_process(ident, trips, start_time=0):  # <1>
    """Yield to simulator issuing event at each state change"""
    time = yield Event(start_time, ident, 'leave garage')  # <2>
    for i in range(trips):  # <3>
        time = yield Event(time, ident, 'pick up passenger')  # <4>
        time = yield Event(time, ident, 'drop off passenger')  # <5>

    yield Event(time, ident, 'going home')  # <6>
    # end of taxi process # <7>
# END TAXI_PROCESS
def main(end_time=DEFAULT_END_TIME, num_taxis=DEFAULT_NUMBER_OF_TAXIS,
         seed=None):
    """Initialize random generator, build procs and run simulation"""
    if seed is not None:
        random.seed(seed)  # get reproducible results

    taxis = {i: taxi_process(i, (i+1)*2, i*DEPARTURE_INTERVAL)
             for i in range(num_taxis)}
    sim = Simulator(taxis)
    sim.run(end_time)

This example shows how to handle events in a main loop and how to drive a coroutine by sending data. This is the basic idea at the bottom of asyncio package. Concurrency is achieved by using coprocessors instead of threads and callbacks.

Topics: Python AI crawler