Python operator overload 2

Posted by newzub on Mon, 22 Nov 2021 14:43:58 +0100

Original text: https://zhuanlan.zhihu.com/p/358748722

preface

In fact, the language feature of operator overloading has been controversial. In view of the abuse of this feature by too many C + + programmers, James Gosling, the father of Java, simply decided not to provide operator overloading for Java. On the other hand, the correct use of operator overloading can really improve the readability and flexibility of the code. To this end, Python imposes some limitations, balancing flexibility, availability, and security. It mainly includes:

  • Operators of built-in types cannot be overloaded
  • You cannot create a new operator, you can only overload an existing operator
  • The is, and, or, and not operators cannot be overloaded (but the bitwise operators &, \ | and ~ can)

Python's operator overloading is very convenient. You only need to rewrite the corresponding special methods. In the previous section, we have introduced how to overload the "+" and "= =" operators of a Vector class, and the implementation is relatively simple. Next, we consider a more complex situation: it is not limited to the Vector class of two-dimensional Vector addition, so as to introduce Python operators to overload more comprehensive knowledge points.

Improved Vector

Considering the application scenario of high-dimensional vectors, we should support the addition of different dimensional vectors, and add 0 by default for the missing value of low-dimensional vectors, which is also a common missing value processing method in some statistical analysis applications. Based on this, the first thing to be determined is that the constructor of the Vector class no longer only receives a fixed number and location of parameters, but should receive variable parameters.

In general, python functions receive variable parameters in two ways. One is to receive variable length parameters, i.e  * Args, so we can use similar   Vector(1, 2)   or   Vector(1, 2, 3)   To initialize vector classes with different dimensions. In this case, the function packages the indefinite length parameter as   args   Of tuples, of course, can meet the needs of iteration. Although this method looks intuitive, considering that the vector class is also a sequence class in terms of function, and the construction method of built-in sequence types in Python basically receives iteratable objects as parameters, we also take this form in consideration of consistency, and override  __ repr__   Output a more intuitive mathematical representation of vector classes.

class Vector:
    def __init__(self, components: Iterable):
        self._components = array('i', components)

    def __repr__(self):
        return str(tuple(self._components))

In order to facilitate the processing of Vector components later, it is saved in an array. The first parameter 'i' indicates that this is an integer array. Another advantage of this is that it ensures the immutability of Vector sequences, which is similar to Python's built-in type immutable list tuple. After this definition, we can instantiate the Vector class as follows:

>>> from vector import Vector
>>> Vector([1, 2])
(1, 2)
>>> Vector((1, 2, 3))
(1, 2, 3)
>>> Vector(range(4))
(0, 1, 2, 3)

Since the Vector class receives an iteratable object as a construction parameter, any implementation  __ iter__   Methods are bound as subclasses of iteratable, so you can pass in iteratable objects such as list, tuple, and range.

Next, overload the plus operator of Vector class. In order to meet the default addition of 0 to low dimensional vectors mentioned earlier, we introduce the method under the iteration toolkit   zip_longest   Method, which can receive multiple iteratable objects and package them into tuples, such as   zip_longest(p, q, ...) --> (p[0], q[0]), (p[1], q[1]), .... At the same time, the keyword parameter fillvalue can specify the default value of filling. But before that, because   zip_longest   The parameter must be an iteratable object, and we also need to implement it for the Vector class  __ iter__   method.

class Vector:
    def __iter__(self):
        return iter(self._components)

    def __add__(self, other):
        pairs = itertools.zip_longest(self, other, fillvalue=0)
        return Vector(a + b for a, b in pairs)

__ add__   The implementation logic of is very simple. Bitwise addition returns a new Vector object. The Generator expression is used when constructing the Vector object, and the Generator is a subclass of Iterable, so it also meets the requirements of constructing parameters.

In order to verify the effect, you also need to reload  ==  Operator. Considering that the two vector dimensions may be different, first compare the dimensions, that is, the number of vector components. Therefore, it needs to be rewritten  __ len__   method. The second is bit-by-bit comparison. The built-in zip function can package two iterative objects to traverse at the same time.

class Vector:
    def __len__(self):
        return len(self._components)

    def __eq__(self, other):
        return len(self) == len(other) and all(a == b for a, b in zip(self, other))

Best practice: use the zip function to traverse both iterators at the same time. This is mentioned in Article 11 of Effective Python. In Python, we often encounter the need to iterate two sequences in parallel. The general approach is to write a for loop to iterate over a sequence, then find a way to obtain its index, and access the corresponding elements of the second sequence through the index. A common approach is to use the enumerate function to   for index, item in enumerate(items)   Gets the index in the same way. Now there is a more elegant way to write. Using the built-in zip function, it can encapsulate two or more iterators into a generator. This generator can take the next value from each iterator to form a tuple at each iteration, and then unpack it in combination with the tuple to achieve the purpose of parallel value retrieval, as in the above code   for a, b in zip(self, other). Obviously, this way is more readable. However, if the sequence to be traversed is not equal in length, the zip function will terminate in advance, which may lead to unexpected results. Therefore, under the condition of uncertain whether the sequences are equal in length, we can consider using the sequences in the itertools module   zip_longest   Function.

So far, the overloaded "+" and "= =" operators have been preliminarily completed, and test cases can be written for verification. As the first comprehensive test class in this series, I will post a complete test code at the end of the article. Here, I will first demonstrate the effect after overloading on the console.

>>> v1 = Vector([1, 2])
>>> v1 == (1, 2)
True
>>> v1 + Vector((1, 1))
(2, 3)
>>> v1 + [1, 1]
(2, 3)
>>> v1 + (1, 1, 1)
(2, 3, 1)

because  __ add__   The other in the method only needs to be an iteratable object without type restrictions, so the overloaded plus operator can not only add two Vector instances, but also support the addition of Vector instances to an iteratable object, whether it is list, tuple or other iteratable types. However, it should be noted that the iteratable object must be used as the second operand, that is, the operand to the right of "+". It's not difficult to understand this, because we only implement the Vector  __ add__   Method, and Python's built-in type class does not understand how to add a Vector, such as the tuple class with error message below.

>>> (1, 1) + v1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate tuple (not "Vector") to tuple

Inverse operator

So what methods do you have that don't need to override the methods in the tuple class  __ add__   Method, which is obviously unreasonable, can also make the overloaded plus operator support   (1, 1) + v1   And? The answer is yes. Before that, I have to mention Python's operator dispatch mechanism.

Python provides a special dispatch mechanism for infix operators. For expressions   a + b, the interpreter will perform the following operations:

  1. If a yes  __ add__   Method and does not return NotImplemented, call   a.__add__(b);
  2. If a no  __ add__   Method or call returns NotImplemented, and check if B there is  __ radd__   Method. If there is and NotImplemented is not returned, call   b.__radd__(a);
  3. If b not  __ radd__   Method or call returns NotImplemented and throws TypeError.

Note: NotImplemented is a special built-in singleton value in Python. If the operator special method cannot handle the given operand, it should be returned to the interpreter.

If will  __ add__   Called the forward method, then  __ radd__   It can be called  __ add__   Method, or right-hand method. This method is used to support the calculation of operands from right to left. Therefore, in order to support   (1, 1) + v1, we need to define the reverse method of Vector class. The reverse method only needs to delegate to the defined method  __ add__   method.

class Vector:
    def __add__(self, other):
        try:
            pairs = itertools.zip_longest(self, other, fillvalue=0)
            return Vector(a + b for a, b in pairs)
        except TypeError:
            return NotImplemented

    def __radd__(self, other):
        return self + other

__ radd__   It's usually that simple, because the interpreter calls   b.__radd__(a) Here, B, that is, V1, is a Vector instance that can be added to a tuple, so at this time   (1, 1) + v1   No more errors. At the same time, it is also right  __ add__   Method is modified to catch a TypeError exception and return NotImplemented. This is also a best practice when overloading infix operators. Throwing an exception will cause the operator dispatch mechanism to terminate, while throwing NotImplemented will make the interpreter try to call the reverse operator method again. When the left and right operands of the operator are of different types, the reverse method may operate normally.

Now verify the overloaded inverse operator:

>>> v1 = Vector([1, 2])
>>> (1, 1) + v1
(2, 3)
>>> [1, 1, 1] + v1
(2, 3, 1)

Comparison operator

For comparison operators, forward and reverse calls use the same series of methods, but the parameters are swapped. Note that it is the same series, not the same method. For example, for "= =", the forward call is   a.__eq__(b) , then the reverse call is   b.__eq__(a); For ">", positive   a.__gt__(b)   The reverse call to is   b.__lt__(a).

If the left operand is called forward  __ eq__   Method returns NotImplemented, and the Python interpreter will try to call the right operand in reverse  __ eq__   Method, if the right operand also returns NotImplemented, the interpreter will not throw a TypeError exception, but will compare the ID of the object for the last fight.

The specific steps for comparing tuple and Vector instances are as follows:

  1. Trying to call tuple's  __ eq__   Method. Since tuple does not know the Vector class, it returns NotImplemented;
  2. Trying to call Vector's  __ eq__   Method returns True.
>>> (1, 2) == Vector([1, 2])
True

In addition, for the "! =" operator, python 3's best practice is to implement only  __ eq__   Method without implementing it because it is inherited from object  __ ne__   The method will be right  __ eq__   The returned result is reversed. Python 2 is different. When overloading "= =", it should also overload "! =" operator. Guido, the father of python, once mentioned that this is a design defect in Python 2 and has been fixed in Python 3.

In place operator

Incremental assignment operator, also known as local operator, such as "+ =", has two operation modes. For immutable types, a += b   Role and of   a = a + b   Completely consistent. Incremental assignment will not modify the immutable target, but create a new instance and then rebind it, that is, the a before and after the operation is not the same object. For immutable types, this is the expected behavior.

For methods that implement local operators, such as  __ iadd__, For the variable type of, a += b   Instead of creating a new object, this method is called to modify the left operand in place. This can be well illustrated by Python's built-in types, immutable tuple s and variable list s.

>>> t = (1, 2)
>>> id(t)
4359598592
>>> t += (3,)
>>> id(t)
4359584960
>>> l = [1, 2]
>>> id(l)
4360054336
>>> l += [3, 4]
>>> id(l)
4360054336

Reading the source code, you will find that the list class implements  __ iadd__   Method is not implemented by the tuple class. For a list, the logic of the "+ =" in place operator is similar to   extend()   The method is the same, appending the elements of an iteratable object to the end of the current list in turn. For tuple, even if there is no definition  __ iadd__   Method, using "+ =" will also delegate to  __ add__   Method returns a new tuple object.

From the design level, the Vector should be consistent with the tuple and designed as an immutable type, that is, a new Vector is generated after each operation on the Vector. From the perspective of functional programming, this design has no side effects (do not modify the incoming parameter state inside the function), so as to avoid some unpredictable problems. Therefore, for immutable types, local special methods must not be implemented. Using the "+ =" operator on a Vector calls an existing  __ add__   Method to generate a new Vector instance. v1 += (1, 1)   And   v1 = v1 + (1, 1)   Consistent behavior.

>>> v1 = Vector([1, 2])
>>> id(v1)
4360163280
>>> v1 += (1, 1)
>>> v1
(2, 3)
>>> id(v1)
4359691376

Appendix: Code

vector.py

import itertools
from array import array
from collections.abc import Iterable


class Vector:
    def __init__(self, components: Iterable):
        self._components = array('i', components)

    def __iter__(self):
        return iter(self._components)

    def __len__(self):
        return len(self._components)

    def __repr__(self):
        return str(tuple(self._components))

    def __eq__(self, other):
        return len(self) == len(other) and all(a == b for a, b in zip(self, other))

    def __add__(self, other):
        try:
            pairs = itertools.zip_longest(self, other, fillvalue=0)
            return Vector(a + b for a, b in pairs)
        except TypeError:
            return NotImplemented

    def __radd__(self, other):
        return self + other

vector_test.py

from vector import Vector


class TestVector:
    def test_should_compare_two_vectors_with_override_compare_operators(self):
        v1 = Vector([1, 2])
        v2 = Vector((1, 2))
        v3 = Vector([2, 3])
        v4 = Vector([2, 3, 4])

        assert v1 == v2
        assert v3 != v2
        assert v4 != v3
        assert (1, 2) == v2
        assert v2 == [1, 2]

    def test_should_add_two_same_dimension_vectors_with_override_add_operator(self):
        v1 = Vector([1, 2])
        v2 = Vector((1, 3))
        result = Vector([2, 5])

        assert result == v1 + v2

    def test_should_add_two_different_dimension_vectors_with_override_add_operator(self):
        v1 = Vector([1, 2])
        v2 = Vector((1, 1, 1))
        result = Vector([2, 3, 1])

        assert result == v1 + v2

    def test_should_add_vector_and_iterable_with_override_add_operator(self):
        v1 = Vector([1, 2])

        assert v1 + (1, 1) == (2, 3)
        assert v1 + [1, 1, 1] == (2, 3, 1)

    def test_should_add_iterable_and_vector_with_override_radd_method(self):
        v1 = Vector([1, 2])

        assert (1, 1) + v1 == (2, 3)
        assert [1, 1, 1] + v1 == (2, 3, 1)

    def test_should_create_new_vector_when_use_incremental_add_operator(self):