Python operator overload 2

Posted by newzub on Mon, 22 Nov 2021 14:43:58 +0100

Original text: https://zhuanlan.zhihu.com/p/358748722

preface

In fact, the language feature of operator overloading has been controversial. In view of the abuse of this feature by too many C + + programmers, James Gosling, the father of Java, simply decided not to provide operator overloading for Java. On the other hand, the correct use of operator overloading can really improve the readability and flexibility of the code. To this end, Python imposes some limitations, balancing flexibility, availability, and security. It mainly includes:

Operators of built-in types cannot be overloaded
You cannot create a new operator, you can only overload an existing operator
The is, and, or, and not operators cannot be overloaded (but the bitwise operators &, \ | and ~ can)

Python's operator overloading is very convenient. You only need to rewrite the corresponding special methods. In the previous section, we have introduced how to overload the "+" and "= =" operators of a Vector class, and the implementation is relatively simple. Next, we consider a more complex situation: it is not limited to the Vector class of two-dimensional Vector addition, so as to introduce Python operators to overload more comprehensive knowledge points.

Improved Vector

Considering the application scenario of high-dimensional vectors, we should support the addition of different dimensional vectors, and add 0 by default for the missing value of low-dimensional vectors, which is also a common missing value processing method in some statistical analysis applications. Based on this, the first thing to be determined is that the constructor of the Vector class no longer only receives a fixed number and location of parameters, but should receive variable parameters.

In general, python functions receive variable parameters in two ways. One is to receive variable length parameters, i.e * Args, so we can use similar Vector(1, 2) or Vector(1, 2, 3) To initialize vector classes with different dimensions. In this case, the function packages the indefinite length parameter as args Of tuples, of course, can meet the needs of iteration. Although this method looks intuitive, considering that the vector class is also a sequence class in terms of function, and the construction method of built-in sequence types in Python basically receives iteratable objects as parameters, we also take this form in consideration of consistency, and override __ repr__ Output a more intuitive mathematical representation of vector classes.

class Vector:
    def __init__(self, components: Iterable):
        self._components = array('i', components)

    def __repr__(self):
        return str(tuple(self._components))

In order to facilitate the processing of Vector components later, it is saved in an array. The first parameter 'i' indicates that this is an integer array. Another advantage of this is that it ensures the immutability of Vector sequences, which is similar to Python's built-in type immutable list tuple. After this definition, we can instantiate the Vector class as follows:

>>> from vector import Vector
>>> Vector([1, 2])
(1, 2)
>>> Vector((1, 2, 3))
(1, 2, 3)
>>> Vector(range(4))
(0, 1, 2, 3)

Since the Vector class receives an iteratable object as a construction parameter, any implementation __ iter__ Methods are bound as subclasses of iteratable, so you can pass in iteratable objects such as list, tuple, and range.

Next, overload the plus operator of Vector class. In order to meet the default addition of 0 to low dimensional vectors mentioned earlier, we introduce the method under the iteration toolkit zip_longest Method, which can receive multiple iteratable objects and package them into tuples, such as zip_longest(p, q, ...) --> (p[0], q[0]), (p[1], q[1]), .... At the same time, the keyword parameter fillvalue can specify the default value of filling. But before that, because zip_longest The parameter must be an iteratable object, and we also need to implement it for the Vector class __ iter__ method.

class Vector:
    def __iter__(self):
        return iter(self._components)

    def __add__(self, other):
        pairs = itertools.zip_longest(self, other, fillvalue=0)
        return Vector(a + b for a, b in pairs)

__ add__ The implementation logic of is very simple. Bitwise addition returns a new Vector object. The Generator expression is used when constructing the Vector object, and the Generator is a subclass of Iterable, so it also meets the requirements of constructing parameters.

In order to verify the effect, you also need to reload == Operator. Considering that the two vector dimensions may be different, first compare the dimensions, that is, the number of vector components. Therefore, it needs to be rewritten __ len__ method. The second is bit-by-bit comparison. The built-in zip function can package two iterative objects to traverse at the same time.

class Vector:
    def __len__(self):
        return len(self._components)

    def __eq__(self, other):
        return len(self) == len(other) and all(a == b for a, b in zip(self, other))

Best practice: use the zip function to traverse both iterators at the same time. This is mentioned in Article 11 of Effective Python. In Python, we often encounter the need to iterate two sequences in parallel. The general approach is to write a for loop to iterate over a sequence, then find a way to obtain its index, and access the corresponding elements of the second sequence through the index. A common approach is to use the enumerate function to for index, item in enumerate(items) Gets the index in the same way. Now there is a more elegant way to write. Using the built-in zip function, it can encapsulate two or more iterators into a generator. This generator can take the next value from each iterator to form a tuple at each iteration, and then unpack it in combination with the tuple to achieve the purpose of parallel value retrieval, as in the above code for a, b in zip(self, other). Obviously, this way is more readable. However, if the sequence to be traversed is not equal in length, the zip function will terminate in advance, which may lead to unexpected results. Therefore, under the condition of uncertain whether the sequences are equal in length, we can consider using the sequences in the itertools module zip_longest Function.

So far, the overloaded "+" and "= =" operators have been preliminarily completed, and test cases can be written for verification. As the first comprehensive test class in this series, I will post a complete test code at the end of the article. Here, I will first demonstrate the effect after overloading on the console.

>>> v1 = Vector([1, 2])
>>> v1 == (1, 2)
True
>>> v1 + Vector((1, 1))
(2, 3)
>>> v1 + [1, 1]
(2, 3)
>>> v1 + (1, 1, 1)
(2, 3, 1)

because __ add__ The other in the method only needs to be an iteratable object without type restrictions, so the overloaded plus operator can not only add two Vector instances, but also support the addition of Vector instances to an iteratable object, whether it is list, tuple or other iteratable types. However, it should be noted that the iteratable object must be used as the second operand, that is, the operand to the right of "+". It's not difficult to understand this, because we only implement the Vector __ add__ Method, and Python's built-in type class does not understand how to add a Vector, such as the tuple class with error message below.

>>> (1, 1) + v1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate tuple (not "Vector") to tuple

Inverse operator

So what methods do you have that don't need to override the methods in the tuple class __ add__ Method, which is obviously unreasonable, can also make the overloaded plus operator support (1, 1) + v1 And? The answer is yes. Before that, I have to mention Python's operator dispatch mechanism.

Python provides a special dispatch mechanism for infix operators. For expressions a + b, the interpreter will perform the following operations:

If a yes __ add__ Method and does not return NotImplemented, call a.__add__(b)；
If a no __ add__ Method or call returns NotImplemented, and check if B there is __ radd__ Method. If there is and NotImplemented is not returned, call b.__radd__(a)；
If b not __ radd__ Method or call returns NotImplemented and throws TypeError.

Note: NotImplemented is a special built-in singleton value in Python. If the operator special method cannot handle the given operand, it should be returned to the interpreter.

If will __ add__ Called the forward method, then __ radd__ It can be called __ add__ Method, or right-hand method. This method is used to support the calculation of operands from right to left. Therefore, in order to support (1, 1) + v1, we need to define the reverse method of Vector class. The reverse method only needs to delegate to the defined method __ add__ method.

class Vector:
    def __add__(self, other):
        try:
            pairs = itertools.zip_longest(self, other, fillvalue=0)
            return Vector(a + b for a, b in pairs)
        except TypeError:
            return NotImplemented

    def __radd__(self, other):
        return self + other

__ radd__ It's usually that simple, because the interpreter calls b.__radd__(a) Here, B, that is, V1, is a Vector instance that can be added to a tuple, so at this time (1, 1) + v1 No more errors. At the same time, it is also right __ add__ Method is modified to catch a TypeError exception and return NotImplemented. This is also a best practice when overloading infix operators. Throwing an exception will cause the operator dispatch mechanism to terminate, while throwing NotImplemented will make the interpreter try to call the reverse operator method again. When the left and right operands of the operator are of different types, the reverse method may operate normally.

Now verify the overloaded inverse operator:

>>> v1 = Vector([1, 2])
>>> (1, 1) + v1
(2, 3)
>>> [1, 1, 1] + v1
(2, 3, 1)

Comparison operator

For comparison operators, forward and reverse calls use the same series of methods, but the parameters are swapped. Note that it is the same series, not the same method. For example, for "= =", the forward call is a.__eq__(b) , then the reverse call is b.__eq__(a)； For ">", positive a.__gt__(b) The reverse call to is b.__lt__(a).

If the left operand is called forward __ eq__ Method returns NotImplemented, and the Python interpreter will try to call the right operand in reverse __ eq__ Method, if the right operand also returns NotImplemented, the interpreter will not throw a TypeError exception, but will compare the ID of the object for the last fight.

The specific steps for comparing tuple and Vector instances are as follows:

Trying to call tuple's __ eq__ Method. Since tuple does not know the Vector class, it returns NotImplemented;
Trying to call Vector's __ eq__ Method returns True.

>>> (1, 2) == Vector([1, 2])
True

In addition, for the "! =" operator, python 3's best practice is to implement only __ eq__ Method without implementing it because it is inherited from object __ ne__ The method will be right __ eq__ The returned result is reversed. Python 2 is different. When overloading "= =", it should also overload "! =" operator. Guido, the father of python, once mentioned that this is a design defect in Python 2 and has been fixed in Python 3.

In place operator

Incremental assignment operator, also known as local operator, such as "+ =", has two operation modes. For immutable types, a += b Role and of a = a + b Completely consistent. Incremental assignment will not modify the immutable target, but create a new instance and then rebind it, that is, the a before and after the operation is not the same object. For immutable types, this is the expected behavior.

For methods that implement local operators, such as __ iadd__， For the variable type of, a += b Instead of creating a new object, this method is called to modify the left operand in place. This can be well illustrated by Python's built-in types, immutable tuple s and variable list s.

>>> t = (1, 2)
>>> id(t)
4359598592
>>> t += (3,)
>>> id(t)
4359584960
>>> l = [1, 2]
>>> id(l)
4360054336
>>> l += [3, 4]
>>> id(l)
4360054336

Reading the source code, you will find that the list class implements __ iadd__ Method is not implemented by the tuple class. For a list, the logic of the "+ =" in place operator is similar to extend() The method is the same, appending the elements of an iteratable object to the end of the current list in turn. For tuple, even if there is no definition __ iadd__ Method, using "+ =" will also delegate to __ add__ Method returns a new tuple object.

From the design level, the Vector should be consistent with the tuple and designed as an immutable type, that is, a new Vector is generated after each operation on the Vector. From the perspective of functional programming, this design has no side effects (do not modify the incoming parameter state inside the function), so as to avoid some unpredictable problems. Therefore, for immutable types, local special methods must not be implemented. Using the "+ =" operator on a Vector calls an existing __ add__ Method to generate a new Vector instance. v1 += (1, 1) And v1 = v1 + (1, 1) Consistent behavior.

>>> v1 = Vector([1, 2])
>>> id(v1)
4360163280
>>> v1 += (1, 1)
>>> v1
(2, 3)
>>> id(v1)
4359691376

Appendix: Code

vector.py

import itertools
from array import array
from collections.abc import Iterable


class Vector:
    def __init__(self, components: Iterable):
        self._components = array('i', components)

    def __iter__(self):
        return iter(self._components)

    def __len__(self):
        return len(self._components)

    def __repr__(self):
        return str(tuple(self._components))

    def __eq__(self, other):
        return len(self) == len(other) and all(a == b for a, b in zip(self, other))

    def __add__(self, other):
        try:
            pairs = itertools.zip_longest(self, other, fillvalue=0)
            return Vector(a + b for a, b in pairs)
        except TypeError:
            return NotImplemented

    def __radd__(self, other):
        return self + other

vector_test.py

from vector import Vector


class TestVector:
    def test_should_compare_two_vectors_with_override_compare_operators(self):
        v1 = Vector([1, 2])
        v2 = Vector((1, 2))
        v3 = Vector([2, 3])
        v4 = Vector([2, 3, 4])

        assert v1 == v2
        assert v3 != v2
        assert v4 != v3
        assert (1, 2) == v2
        assert v2 == [1, 2]

    def test_should_add_two_same_dimension_vectors_with_override_add_operator(self):
        v1 = Vector([1, 2])
        v2 = Vector((1, 3))
        result = Vector([2, 5])

        assert result == v1 + v2

    def test_should_add_two_different_dimension_vectors_with_override_add_operator(self):
        v1 = Vector([1, 2])
        v2 = Vector((1, 1, 1))
        result = Vector([2, 3, 1])

        assert result == v1 + v2

    def test_should_add_vector_and_iterable_with_override_add_operator(self):
        v1 = Vector([1, 2])

        assert v1 + (1, 1) == (2, 3)
        assert v1 + [1, 1, 1] == (2, 3, 1)

    def test_should_add_iterable_and_vector_with_override_radd_method(self):
        v1 = Vector([1, 2])

        assert (1, 1) + v1 == (2, 3)
        assert [1, 1, 1] + v1 == (2, 3, 1)

    def test_should_create_new_vector_when_use_incremental_add_operator(self):

Programmer Think