Numpy: array oriented programming

Posted by nemo on Wed, 08 Dec 2021 11:27:31 +0100

Using numpy array allows us to use simple array expressions to complete a variety of data operation tasks without writing a large number of loops (avoiding loops is the core idea). This method of using array expressions instead of displaying loops is called vectorization. Generally, vectorized array operations are one or two orders of magnitude faster than pure Python, which has the greatest impact on all kinds of numerical calculations.

1. Proof of array programming power

Explain the power of array programming with a column:

Calculation function f = sqrt(x^2 + y^2)，among x and y Are one-dimensional arrays.

Python programming idea: by writing double for loop calculation, the time complexity is reduced O ( n 2 ) O(n^2) O(n2).
Array oriented programming idea: generate two arrays into a two-dimensional matrix, and then use the element by element calculation of the matrix to avoid circulation.

[example 1] comparison of code running time of two ideas

#When the array size is single digits
In [406]: x
Out[406]: [0, 1, 2, 3, 4]

In [407]: y
Out[407]: [0, 1, 2, 3, 4, 5, 6]

In [408]: %load test.py #Magic function, load the written code

In [409]: # %load test.py
     ...: import math
     ...:
     ...: def f(x,y):
     ...:     temp = []
     ...:     for i in x:
     ...:         for j in y:
     ...:             temp.append(math.sqrt(math.pow(i,2) + math.pow(j,2)))
     ...:     return temp
 

In [410]: %time f(x,y)
CPU times: user 73 µs, sys: 59 µs, total: 132 µs
Wall time: 136 µs

In [411]: xs,ys = np.meshgrid(x,y)
In [413]: %time np.sqrt(xs ** 2 + ys ** 2)
CPU times: user 71 µs, sys: 33 µs, total: 104 µs
Wall time: 109 µs
#When the array size is thousands
In [415]: x = list(_ for _ in range(999))
In [416]: y = list(_ for _ in range(1001))

In [417]: %time f(x,y)
CPU times: user 384 ms, sys: 11 ms, total: 395 ms
Wall time: 394 ms

In [418]: xs, ys = np.meshgrid(x,y)
In [419]: %time np.sqrt(xs ** 2 + ys ** 2)
CPU times: user 10.1 ms, sys: 11.5 ms, total: 21.5 ms
Wall time: 22.6 ms

First of all, it should be noted that the results calculated by the two methods are the same, because they are too long, they are not posted here. The following is a time comparison:

thinking	Data scale	total time
Python	Bit	132 µs
array	Bit	104 µs
Python	Thousand bit	395 ms
array	Thousand bit	21.5 ms

With the increase of data scale, the execution efficiency advantage of array method is very obvious. When we do data processing, there are a lot of 100000 and millions of data. In this case, the advantage of array oriented programming is obvious.

Explanation of meshgrid function: in this example, it is obvious that the function is a binary function, so if the function is in three-dimensional space, x and Y form a plane, the function of meshgrid function is to represent the points corresponding to x and Y on the plane with a two-dimensional matrix, and the elements of xs and ys at the same position correspond to a point pair on the plane, Therefore, the calculation result is the same as that of the for loop. As shown in the following example, xs is a simple copy of the original array as a row, and ys is a simple copy of the original array as a column. The first column of xs and the first column of YS correspond to points (0,0), (0,1), (0,2), (0,3), (0,4), (0,5) on the plane, and so on. The whole matrix, that is, arrays x and y, correspond to all numerical pairs calculated one by one. The calculation of these numerical pairs is equivalent to the calculation of double for loop.
[example 2] explain the meshgrid function

In [42]: x = np.arange(5)

In [43]: y = np.arange(6)

In [44]: xs, ys = np.meshgrid(x,y)

In [45]: xs
Out[45]:
array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

In [46]: ys
Out[46]:
array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4],
       [5, 5, 5, 5, 5]])

Note: the test time of meshgrid is not added here, and its time consumption is very small and can be ignored. If x and y are darray, the operation efficiency can be improved by about 40%.

Like the meshgrid function, numpy also provides many functions to support array oriented programming, which can be roughly divided into the following categories: statistical method, sorting, Boolean array processing, unique value and set operation, array version ternary expression where.

2. Statistical methods (sum, mean, etc.)

method	describe	method	describe
sum	Calculate the cumulative sum of all elements along the axis, an array of 0 length, and the cumulative sum is 0	mean	Mathematical average, the average value of 0-length array is NaN
min, max	Minimum and maximum	argmin, argmax	Location of minimum and maximum values
cumsum	Element accumulation from 0 to the current element	cumprod	Element accumulation from 1 to the current element
std, var	Standard deviation and variance can be adjusted by degrees of freedom (the default denominator is n)

Note: the methods in the table are array methods, which can be called directly through the array.
[example] examples of some statistical methods

In [450]: arr = np.arange(12).reshape(4,3)

In [451]: arr
Out[451]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])
#Calculate sum       
In [453]: arr.sum()
Out[453]: 36
#Calculate 0 axial
In [454]: arr.sum(0)
Out[454]: array([18, 22, 26])
#Calculation 1 axial
In [455]: arr.sum(1)
Out[455]: array([ 3, 12, 21, 30])

In [458]: arr.cumsum(0)
Out[458]:
array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15],
       [18, 22, 26]])

In [459]: arr.cumsum(1)
Out[459]:
array([[ 0,  1,  3],
       [ 3,  7, 12],
       [ 6, 13, 21],
       [ 9, 19, 30]])

3. Sorting

numpy can use the sort method to sort by location. For multidimensional arrays, you need to pass in the axis value. The sort method can be used in two ways: one is to change the original array, and the other is to create a copy.

arr.sort()   //Change the order of the original array elements
np.sort(arr) //Generate a sorted copy without changing the order of the original array

[example] use example of sort method

In [467]: arr
Out[467]: array([0.19008705, 0.46913444, 0.33954087, 0.7044972 , 0.2260952 ])

In [468]: np.sort(arr)
Out[468]: array([0.19008705, 0.2260952 , 0.33954087, 0.46913444, 0.7044972 ])
#After using np.sort, the order of the original array elements remains unchanged
In [469]: arr
Out[469]: array([0.19008705, 0.46913444, 0.33954087, 0.7044972 , 0.2260952 ])

In [470]: arr.sort()
#After using arr.sort, the order of the original array elements is changed
In [471]: arr
Out[471]: array([0.19008705, 0.2260952 , 0.33954087, 0.46913444, 0.7044972 ])

4. Boolean array processing

numpy also provides processing functions for Boolean arrays. Common operation requirements and functions are as follows.

operation	function
Calculate the number of True	arr.sum(), the Boolean value will be forced to 1(True) or 0 (false), so it can be calculated by summation
Are all True	arr.all()
At least one is True	arr.any()

[example] Boolean array operation example

In [476]: arr = np.random.randint(-10,10,size=(3,3))

In [477]: arr
Out[477]:
array([[-5,  5,  6],
       [ 0, -2, -5],
       [-2,  7,  9]])

In [478]: (arr>0).sum()
Out[478]: 4

In [479]: bools = arr>0

In [480]: bools
Out[480]:
array([[False,  True,  True],
       [False, False, False],
       [False,  True,  True]])

In [481]: bools.any()
Out[481]: True

In [482]: bools.all()
Out[482]: False

5. Unique value and set operation (operation on one-dimensional array)

numpy contains some basic set operations for one-dimensional ndarray.

method	describe
unique(x)	Calculate the unique value of x and sort
intersect1d(x,y)	Calculate the intersection of x and y and sort
union1d(x,y)	Calculate the union of x and y and sort
in1d(x,y)	Calculates whether the elements in x are included in y and returns an array of Boolean values
setdiff1d(x,y)	Difference set, the element of X in x but not in y
setxor1d(x,y)	An exclusive or set, an element in x or y that does not belong to the intersection of x and y

Note: x and y in the above methods are one-dimensional arrays.
[example] one dimensional array processing example

In [484]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will'])

In [485]: np.unique(names)
Out[485]: array(['Bob', 'Joe', 'Will'], dtype='<U4')

In [486]: values = np.array([6,0,0,3,2,5,6])

In [487]: np.in1d(values, [2,3,6])
Out[487]: array([ True, False, False,  True,  True, False,  True])

6. Array version ternary expression where

The where function is a vectorization implementation of the Python ternary expression x if condition else y. through the where function, we can replace the values in the array through conditional selection or perform other operations. The following example 1 illustrates the specific syntax and example 2 is simple to apply.
[example 1] where syntax example

In [489]: xarr = np.array([1.1,1.2,1.3,1.4,1.5])

In [490]: yarr = np.array([2.1,2.2,2.3,2.4,2.5])

In [491]: cond = np.array([True,False,True,True,False])

In [492]: result = np.where(cond, xarr, yarr)

In [493]: result
Out[493]: array([1.1, 2.2, 1.3, 1.4, 2.5])

[example 2] where application example

In [495]: arr = np.random.randn(4,4)

In [496]: arr
Out[496]:
array([[ 0.49042258, -0.15916412,  0.50222044, -0.28018826],
       [ 1.698109  ,  0.72016023, -0.19839624, -0.36279838],
       [ 0.0680141 ,  0.74892178, -0.97592457,  0.58013547],
       [ 0.23533146,  0.01091129,  0.14336871,  1.35089938]])

In [497]: arr > 0
Out[497]:
array([[ True, False,  True, False],
       [ True,  True, False, False],
       [ True,  True, False,  True],
       [ True,  True,  True,  True]])

In [498]: np.where(arr > 0, 2, -2)
Out[498]:
array([[ 2, -2,  2, -2],
       [ 2,  2, -2, -2],
       [ 2,  2, -2,  2],
       [ 2,  2,  2,  2]])

In [499]: np.where(arr > 0, 2, arr)
Out[499]:
array([[ 2.        , -0.15916412,  2.        , -0.28018826],
       [ 2.        ,  2.        , -0.19839624, -0.36279838],
       [ 2.        ,  2.        , -0.97592457,  2.        ],
       [ 2.        ,  2.        ,  2.        ,  2.        ]])

Topics: Python Algorithm numpy linear algebra

Programmer Think