Python - [function decorator | closure] - Notes on reading (chao)

Posted by webwiese on Thu, 25 Jun 2020 10:36:45 +0200

This article is from Chapter 7 of fluent Python (First Edition): function decorators and closures. Since I was just learning Python, taking notes can be said to be a complete transcription, and I'm becoming more and more apathetic at the back, so I don't want to copy (taking notes) anymore. From page p171 to section 7.8.2, I don't want or need to continue.

(helpless ~ ~)

Function decorators are used to "tag" functions in the source code to enhance the behavior of functions in some way. To master decorators, you must understand closures. In addition to being useful in decorators, closures are the foundation of asynchronous callback programming and functional programming styles.

This article is to explain the working principle of function decorator, including the simplest register decorator and the more complex parametric decorator.

1 basic knowledge of decorator

A decorator is a callable object whose arguments are another function (the decorated function). The decorator may process the decorated function and return it, or replace it with another function or callable object.

If there is a decorator named decorate:

@decorate
def target():
    print('running target()')

The effect of the above code is the same as the following:

def target():
    print('running target()')

target = decorate(target)

The final result of the two writing methods is the same: the target obtained after the execution of the above two code fragments is not necessarily the original target function, but the function returned by decorate(target).

To make sure that the decorated function will be replaced, see the console dialog below.

# Decorators usually replace a function with another

>>> def deco(func):
...     def inner():
...         print('running inner()')
...     return inner # deco returns the inner function object
...
>>> @deco
... def target(): # Decorate target with deco
...     print('running target()')
...
>>> target() # Calling the decorated target will actually run the inner
running inner()
>>> target # Review the object and find that target is now a reference of inner
<function deco.<locals>.inner at 0x000001FB4640A550>

Strictly speaking, decorators are just grammar sugar. As shown above, the decorator can be called like a regular callable object with another function as its argument. To sum up, a major feature of decorators is that they can replace the decorated functions with other functions. The second feature is that the decorator executes immediately when the module is loaded.

2 when Python executes the decorator

A key feature of decorators is that they run immediately after the function being decorated is defined. This is usually at import time (that is, when Python loads the module). Here is the sample code registration.py :

registry = []

def register(func):
    print('running register(%s)' % func)
    registry.append(func)
    return func

@register
def f1():
    print('running f1()')

@register
def f2():
    print('running f2()')

def f3():
    print('running f3()')

def main():
    print('running main()')
    print('registry ->', registry)
    f1()
    f2()
    f3()

if __name__ == '__main__':
    main()

hold registration.py The output as a script is as follows:

running register(<function f1 at 0x000002B32DA25430>)
running register(<function f2 at 0x000002B32DA254C0>)
running main()
registry -> [<function f1 at 0x000002B32DA25430>, <function f2 at 0x000002B32DA254C0>]
running f1()
running f2()
running f3()

Note that register runs (twice) before the other functions in the module. When calling register, the parameter passed to it is the decorated function, for example, < function f1 at 0x000002b32da25430 >. After loading the module, there are two references to the decorated function in registry: f1 and f2. f1, f2 and f3 functions are executed only when main explicitly calls them.

If importing registration.py Module (not run as script), output as follows:

>>> import registration
running register(<function f1 at 0x00000238FDF2A550>)
running register(<function f2 at 0x00000238FDF2A5E0>)

Check the value of registry as follows:

>>> registration.registry
[<function f1 at 0x00000238FDF2A550>, <function f2 at 0x00000238FDF2A5E0>]

above registration.py I want to emphasize that function decorators execute immediately when importing modules, while the decorated functions run only when explicitly called. This highlights what Python programmers call the difference between the import time and the runtime.

Considering how decorators are commonly used in real code, registration.py There are two unusual places:

In the first mock exam, 1, decorator function is defined in the same module as decorated function. In fact, decorators are usually defined in one module and then applied to functions in other modules.

2. The function returned by the register decorator is the same as the function passed in through the parameter. In fact, most decorators define a function internally and return it.

3 use decorator to improve "strategy mode"

Use the registration decorator to improve the example of e-commerce promotion discounts in Section 6.1 of fluent Python.

The main problem with example 6-6 is that there is a function name in the definition body, but the best_promo is used to determine the promos list with the largest discount range, which also has the function name. This repetition is a problem, because you may forget to add the new policy function to the promos list, resulting in best_promo ignores the new strategy and does not report errors. The following code uses the register decorator to solve this problem:

# Values in the promos list are populated with the promotion decorator

promos = [] # List of storage policies

def promotion(promo_func): # promotion will promote_ Func is added to the promos list and returns the function intact
    promos.append(promo_func)
    return promo_func

@promotion
def fidelity(order):
    """5 for customers with points of 1000 or more%Discount?"""
    return order.total() * .05 if order.customer.fidelity >= 1000 else 0

@promotion
def bulk_item(order):
    """10 for 20 or more items%Discount?"""
    discount = 0
    for item in order.cart: # Traverse shopping cart items
        if item.quantity >= 20:
            discount += item.total() * .1
    return discount

@promotion
def large_order(order):
    """When there are 10 or more different products in the order, 7%Discount?"""
    distinct_items= {item.product for item in order.cart}
    if len(distinct_items) >= 10:
        return order.total() * .07
    return 0

def best_promo(order): # best_promo does not need to be modified because it depends on the promos list
    """Choose the best discount available"""
    return max(promo(order) for promo in promos)

The above code has several advantages:

The promotion strategy function does not need to use a special name_ End of promo)
@The promotion decorator highlights the role of the decorated function, and it is also convenient to temporarily disable a policy function -- just comment out the decorator
Promotion and discount strategies can be defined in other modules as long as @ promotion decoration is used.

However, most decorators modify the decorated function. Usually, the decorator defines an internal function and returns it to replace the decorated function. Almost all code that uses internal functions relies on closures to function correctly. To understand closures, let's first look at Python's variable scope.

4 variable scope rules

In the following code, a function is defined and tested, which reads the value of two variables: local variable a - the parameter of the function; variable b - the function does not define it.

>>> def f1(a):
...     print(a)
...     print(b)
...
>>> f1(3)
3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in f1
NameError: name 'b' is not defined

It's normal for an error to occur. In the code, if the global variable b is assigned first and then f1 is called, no error will occur:

>>> b = 6
>>> f1(3)
3
6

Here's a special example. The two lines of the f2 function in the following code are the same as f1 above, and then assign a value to b, and print the value of b. But before the assignment, the second print failed:

>>> # b is a local variable because it is assigned a value in the definition body of the function
>>> b = 6
>>> def f2(a):
...     print(a)
...     print(b)
...     b = 9
...
>>> f2(3)
3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in f2
UnboundLocalError: local variable 'b' referenced before assignment

Note that 3 is output first, which indicates that the print(a) statement is executed. But the second statement, print(b), cannot be executed. It feels like printing 6, because there is a global variable B, and the local variable B is assigned after print(b).

In fact, when Python compiles the definition body of a function, it determines that b is a local variable because it is assigned a value in the function. The bytecode generated confirms this judgment, and python will try to get b from the local environment. When f2(3) is called later, the definition of f2 will get and print the value of local variable a, but when trying to get the value of local variable b, it is found that b has no binding value.

This is not a flaw, but a design choice: Python does not require variables to be declared, but assumes that the variables assigned in the body of the function definition are local variables.

If you want the interpreter to treat b as a global variable when assigning a value in a function, you need to use global declaration:

>>> b = 6
>>> def f3(a):
...     global b
...     print(a)
...     print(b)
...     b = 9
...
>>> f3(3)
3
6
>>> b
9
>>> f3(3)
3
9

5 closure

A closure is a function that extends the scope and contains non global variables referenced in the body of the function definition but not defined in the body of the definition. The key of closure is to be able to access non global variables defined outside the definition body.

The concept is difficult to grasp. Let's take an example to understand: if there is a function called avg, its function is to calculate the mean value of the increasing sequence value; for example, the average closing price of a commodity in the whole history. New prices are added every day, so the average takes into account all prices so far.

At first, avg was used as follows:

>>> avg(10) # In fact, the avg function has not been defined, but the following results are in line with expectations
10.0
>>> avg(11)
10.5
>>> avg(10)
11

Here is an average_oo.py :

class Averager():
    def __init__(self):
        self.series = []
        
    def __call__(self, new_value):
        self.series.append(new_value)
        total = sum(self.series)
        return total/len(self.series)

An instance of Average is a callable object:

>>> avg = Averager()
>>> avg(10)
10.0
>>> avg(11)
10.5
>>> avg(12)
11.0

The following is a functional implementation to calculate the moving average, using the higher-order function make_averager. averager.py :

def make_averager(): # # Higher order function for calculating moving average
    series = []
    def averager(new_value):
        series.append(new_value)
        total = sum(series)
        return total/len(series)
    return averager

Call make_ When averager, an averager function object is returned. Each time averager is called, it adds parameters to the series and calculates the current average. The following code is for testing averager.py :

>>> avg = make_averager()
>>> avg(10)
10.0
>>> avg(11)
10.5
>>> avg(12)
11.0

be careful, averager.py And average_oo.py There is something in common: call average() or make_averager() gets a callable object, avg, which updates the historical value and calculates the current average value. In average_oo.py In, avg is an instance of averager; in averager.py avg is the internal function averager. Anyway, just call avg(n), put n in the sequence value, and then recalculate the mean value.

The instance avg of the Averager class self.series The history value is stored in. But where does the avg function in the second example look for series?

Note that series are make_ Local variable of the averager function, because series is initialized in the definition body of the function: Series = []. However, when calling avg(10), make_ The averager function has returned, and its local scope is gone.

In the averager function, series is a free variable, which refers to a variable that is not bound in the local scope.

Review the returned make_averager object. Python is found in__ code__ Property, which represents the body of the compiled function definition, holds the names of local and free variables. Here is [review make_ Code of function created by averager:

>>> avg.__code__.co_varnames
('new_value', 'total')
>>> avg.__code__.co_freevars
('series',)

The binding of series is in the__ closure__ Property. avg.__closure__ Each element in corresponds to AVG__ code__ .co_ A name in freevars. These elements are cell objects with a cell_ The content attribute holds the real value. The values of these properties are as follows:

>>> avg.__code__.co_freevars
('series',)
>>> avg.__closure__
(<cell at 0x000001E467125B80: list object at 0x000001E467141400>,)
>>> avg.__closure__[0].cell_contents
[10, 11, 12]

To sum up, a closure is a function. It retains the binding of free variables existing in the definition of a function. In this way, when a function is called, although the definition scope is not available, those bindings can still be used.

Note that only functions nested in other functions may need to handle external variables that are not in the global scope.

6 non local declaration

In the previous section averager.py In, the program stores all values in a history sequence, and then sums them with sum each time averager is called. A better way is to store only the current total value and the number of elements, and then use these two numbers to calculate the mean value.

The following code is flawed just to clarify the point:

def make_averager(): # averager1.py is a high-level function to calculate the moving average value. It does not save all the historical values, but it is defective
    count = 0
    total = 0

    def averager(new_value):
        count += 1
        total += new_value
        return total/count
    return averager

Try to use the above function, and you will get the following results:

>>> avg = make_averager()
>>> avg(10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\other\Pycharm study\averager1.py", line 6, in averager
    count += 1
UnboundLocalError: local variable 'count' referenced before assignment

The problem is that when count is a number or any immutable type, the count += 1 statement has the same effect as count = count + 1. In this case, it is equivalent to assigning a value to count in the definition body of the averager, which will change count into a local variable. The total variable is also affected by this problem. ,

averager.py I didn't encounter this problem because the code didn't assign a value to series, just called series.append , and pass it on to sum and len. That is, the code takes advantage of the fact that lists are mutable objects.

But for immutable types such as numbers, strings, tuples, etc., they can only be read, not updated. If you try to rebind, such as count = count + 1, the local variable count is implicitly created. In this way, count is not a free variable, so it will not be saved in the closure.

Python introduces nonlocal declaration to mark variables as free variables. Even if new values are given to variables in functions, they will become free variables. If a new value is assigned to a variable declared by nonlocal, the binding saved in the closure is updated. make_ The final correct implementation version of averager is as follows:

def make_averager():
    count = 0
    total = 0

    def averager(new_value):
        nonlocal count, total
        count += 1
        total += new_value
        return total / count
    return averager

7 realize a simple decorator

The following code defines a decorator, whose function is to time the decorated function every time it is called, and then print the elapsed time, the parameters passed in and the result of the call( clockdeco.py ):

import time

def clock(func):
    def clocked(*args): # Define the internal function clocked, which takes any positioning parameter
        t0 = time.perf_counter()
        result = fun(*args)
        elapsed = time.perf_counter() - t0
        name = func.__name__
        arg_str = ', '.join(repr(arg) for arg in args)
        print('[%0.8fs] %s(%s) -> %r' % (elapsed, name, arg_str, result))
        return result
    return clocked # Returns the inner function, replacing the decorated function

The following code demonstrates the use of the clock Deco_ demo.py ):

# clockdeco_demo.py  Using the clock decorator

import time
from clockdeco import clock

@clock
def snooze(seconds):
    time.sleep(seconds)

@clock
def factorial(n):
    return 1 if n < 2 else n * factorial(n - 1)

if __name__ == '__main__':
    print('*' * 40, 'Calling snooze(.123)')
    snooze(.123)
    print('*' * 40, 'Calling factorial(6)')
    print('6! =', factorial(6))

The output result is:

**************************************** Calling snooze(.123)
[0.12318430s] snooze(0.123) -> None
**************************************** Calling factorial(6)
[0.00000200s] factorial(1) -> 1
[0.00004970s] factorial(2) -> 2
[0.00008730s] factorial(3) -> 6
[0.00011450s] factorial(4) -> 24
[0.00014130s] factorial(5) -> 120
[0.00017300s] factorial(6) -> 720
6! = 720

The typical behavior of a decorator is to replace the decorated function with a new function, both of which take the same parameters, and (usually) return the value that the decorated function should have returned, while doing some extra operations.

clockdeco.py The implementation of the decorator has the following disadvantages: it does not support keyword parameters, and covers the__ name__ And__ doc__ Property. The following code (clockdeco2.py) uses the functools.wraps The decorator copies related attributes from func to clocked, and the code can also handle key parameters:

import time
import functools
def clock(func):
    @functools.wraps(func) # Decorator copies related properties from func to clocked
    def clocked(*args, **kwargs):
        t0 = time.time()
        result = func(*args, **kwargs)
        elapsed = time.time() - t0
        name = func.__name__
        arg_lst = []
        if args:
            arg_lst.append(', '.join(repr(arg) for arg in args))
        if kwargs:
            pairs = ['%s=%r' % (k, w) for k, w in sorted(kwargs.items())]
            arg_lst.append(', '.join(pairs))
        arg_str = ', '.join(arg_lst)
        print('[%0.8fs] %s(%s) -> %r' % (elapsed, name, arg_str, result))
        return result
    return clocked

8 decorators in standard library

Here are two notable decorators in the standard library.

8.1 use functools.lru_ Make a note of cache

functools.lru_cache is a very practical decorator, which realizes the function of memo. This is an optimization technique, which saves the results of time-consuming functions and avoids repeated calculation when the same parameters are passed in. LRU is the abbreviation of "Least Recently Used", which means that the cache will not grow unlimited, and cache entries that are not used for a period of time will be discarded.

The slow recursive function of generating the nth Fibonacci number is suitable for lru_cache. The following code (fibo_demo.py )LRU not used_ Cache decorator, recursive way is very time-consuming. The code is as follows:

from clockdeco import clock

@clock
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-2) + fibonacci(n-1)
if __name__ == '__main__':
    print(fibonacci(6))

Running fibo_demo.py , the results are as follows. Except for the last line, the rest of the output is generated by the clock decorator:

[0.00000040s] fibonacci(0) -> 0
[0.00000050s] fibonacci(1) -> 1
[0.00005350s] fibonacci(2) -> 1
[0.00000030s] fibonacci(1) -> 1
[0.00000030s] fibonacci(0) -> 0
[0.00000020s] fibonacci(1) -> 1
[0.00001230s] fibonacci(2) -> 1
[0.00002400s] fibonacci(3) -> 2
[0.00008970s] fibonacci(4) -> 3
[0.00000020s] fibonacci(1) -> 1
[0.00000020s] fibonacci(0) -> 0
[0.00000020s] fibonacci(1) -> 1
[0.00001140s] fibonacci(2) -> 1
[0.00002280s] fibonacci(3) -> 2
[0.00000020s] fibonacci(0) -> 0
[0.00000030s] fibonacci(1) -> 1
[0.00001150s] fibonacci(2) -> 1
[0.00000020s] fibonacci(1) -> 1
[0.00000030s] fibonacci(0) -> 0
[0.00000030s] fibonacci(1) -> 1
[0.00001210s] fibonacci(2) -> 1
[0.00002340s] fibonacci(3) -> 2
[0.00004610s] fibonacci(4) -> 3
[0.00008010s] fibonacci(5) -> 5
[0.00018170s] fibonacci(6) -> 8
8

It is obvious that fibonacci(1) has been called eight times and fibonacci(2) has been called five times. The following code (fibo_demo2.py) using LRU_ The performance of the cache decorator will be significantly improved:

import functools
from clockdeco import clock

@functools.lru_cache() # LRU must be called like a regular function_ cache.  Parenthesis (), which makes lru_cache can accept configuration parameters, which will be explained in detail later
@clock # Stacked decorators: @ lru_cache() is applied to the function returned by @ clock
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-2) + fibonacci(n-1)

if __name__ == '__main__':
    print(fibonacci(6))

As a result, execution time is halved, and each value of n calls the function only once:

[0.00000090s] fibonacci(0) -> 0
[0.00000140s] fibonacci(1) -> 1
[0.00641200s] fibonacci(2) -> 1
[0.00000090s] fibonacci(3) -> 2
[0.00643280s] fibonacci(4) -> 3
[0.00000070s] fibonacci(5) -> 5
[0.00645350s] fibonacci(6) -> 8
8

In addition to optimizing recursive algorithms, lru_cache can also play a huge role in the application of getting information from the web.

lru_cache can be configured with two optional parameters. Its signature (signature is the way to call a function, which defines the input and output of the function) is:

functools.lru_cache(maxsize=128, typed=False)

The maxsize parameter specifies how many results of the call are stored. When the cache is full, the old results are discarded to make room. For best performance, maxsize should be set to the power of 2. If the typed parameter is set to True, the results of different parameter types will be saved separately, that is, the floating-point numbers and integer parameters (such as 1 and 1.0) that are generally considered equal will be separated. Because LRU_ The cache uses a dictionary to store the results, and the key is created according to the location parameters and key parameters passed in during the call, so it is LRU_ All the parameters of the function decorated by cache must be hashable (go to Baidu if you don't understand it).

Here's how functools.singledispatch

8.2 single distribution universal function

Suppose we are developing a tool for debugging web applications. We like to generate HTML and display different types of Python objects. We might write functions like this:

import html

def htmlize(obj):
    content = html.escape(repr(obj))
    return '<pre>pre</pre>'.format(content)

The above functions apply to any Python type. Now, I want to extend the function to display some types of data in a special way:

str: replace the internal line break with ' \ n'; use instead of < pre >.
int: displays numbers in decimal and hexadecimal.
List: output an HTML list and format it according to the type of each element.

The desired behavior (generating HTML's htmlize function and adjusting the output of several objects) is as follows:

>>> # The output of all the codes in this section is ideal, and the codes that realize these functions will be listed later
>>> htmlize({1, 2, 3}) # 1
'<pre>{1, 2, 3}</pre>'
>>> htmlize(abs)
'<pre>&lt;built-in function abs&gt;</pre>'
>>> htmlize('Heimlich & Co.\n- a game') # 2
'<p>Heimlich &amp; Co.<br>\n- a game</p>'
>>> htmlize(42) # 3
'<pre>42 (0x2a)</pre>'
>>> print(htmlize(['alpha', 66, {3, 2, 1}])) # 4
<ul>
<li><p>alpha</p></li>
<li><pre>66 (0x42)</pre></li>
<li><pre>{1, 2, 3}</pre></li>
</ul>

notes:
1. By default, the object string representation after HTML escape is displayed in < pre > < pre >.
2. The str object is also displayed as a string representation after HTML escape, but it is placed in and is used for line breaking.
3.int is displayed in decimal and hexadecimal forms, which are placed in < pre > < pre >.
4. Each list item is formatted according to its type, and the whole list is rendered as HTML list.