Using numpy array allows us to use simple array expressions to complete a variety of data operation tasks without writing a large number of loops (avoiding loops is the core idea). This method of using array expressions instead of displaying loops is called vectorization. Generally, vectorized array operations are one or two orders of magnitude faster than pure Python, which has the greatest impact on all kinds of numerical calculations.
1. Proof of array programming power
Explain the power of array programming with a column:
Calculation function f = sqrt(x^2 + y^2),among x and y Are one-dimensional arrays.
Python programming idea: by writing double for loop calculation, the time complexity is reduced
O
(
n
2
)
O(n^2)
O(n2).
Array oriented programming idea: generate two arrays into a two-dimensional matrix, and then use the element by element calculation of the matrix to avoid circulation.
[example 1] comparison of code running time of two ideas
#When the array size is single digits In [406]: x Out[406]: [0, 1, 2, 3, 4] In [407]: y Out[407]: [0, 1, 2, 3, 4, 5, 6] In [408]: %load test.py #Magic function, load the written code In [409]: # %load test.py ...: import math ...: ...: def f(x,y): ...: temp = [] ...: for i in x: ...: for j in y: ...: temp.append(math.sqrt(math.pow(i,2) + math.pow(j,2))) ...: return temp In [410]: %time f(x,y) CPU times: user 73 µs, sys: 59 µs, total: 132 µs Wall time: 136 µs In [411]: xs,ys = np.meshgrid(x,y) In [413]: %time np.sqrt(xs ** 2 + ys ** 2) CPU times: user 71 µs, sys: 33 µs, total: 104 µs Wall time: 109 µs #When the array size is thousands In [415]: x = list(_ for _ in range(999)) In [416]: y = list(_ for _ in range(1001)) In [417]: %time f(x,y) CPU times: user 384 ms, sys: 11 ms, total: 395 ms Wall time: 394 ms In [418]: xs, ys = np.meshgrid(x,y) In [419]: %time np.sqrt(xs ** 2 + ys ** 2) CPU times: user 10.1 ms, sys: 11.5 ms, total: 21.5 ms Wall time: 22.6 ms
First of all, it should be noted that the results calculated by the two methods are the same, because they are too long, they are not posted here. The following is a time comparison:
thinking | Data scale | total time |
---|---|---|
Python | Bit | 132 µs |
array | Bit | 104 µs |
Python | Thousand bit | 395 ms |
array | Thousand bit | 21.5 ms |
With the increase of data scale, the execution efficiency advantage of array method is very obvious. When we do data processing, there are a lot of 100000 and millions of data. In this case, the advantage of array oriented programming is obvious.
Explanation of meshgrid function: in this example, it is obvious that the function is a binary function, so if the function is in three-dimensional space, x and Y form a plane, the function of meshgrid function is to represent the points corresponding to x and Y on the plane with a two-dimensional matrix, and the elements of xs and ys at the same position correspond to a point pair on the plane, Therefore, the calculation result is the same as that of the for loop. As shown in the following example, xs is a simple copy of the original array as a row, and ys is a simple copy of the original array as a column. The first column of xs and the first column of YS correspond to points (0,0), (0,1), (0,2), (0,3), (0,4), (0,5) on the plane, and so on. The whole matrix, that is, arrays x and y, correspond to all numerical pairs calculated one by one. The calculation of these numerical pairs is equivalent to the calculation of double for loop.
[example 2] explain the meshgrid function
In [42]: x = np.arange(5) In [43]: y = np.arange(6) In [44]: xs, ys = np.meshgrid(x,y) In [45]: xs Out[45]: array([[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]) In [46]: ys Out[46]: array([[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3], [4, 4, 4, 4, 4], [5, 5, 5, 5, 5]])
Note: the test time of meshgrid is not added here, and its time consumption is very small and can be ignored. If x and y are darray, the operation efficiency can be improved by about 40%.
Like the meshgrid function, numpy also provides many functions to support array oriented programming, which can be roughly divided into the following categories: statistical method, sorting, Boolean array processing, unique value and set operation, array version ternary expression where.
2. Statistical methods (sum, mean, etc.)
method | describe | method | describe |
---|---|---|---|
sum | Calculate the cumulative sum of all elements along the axis, an array of 0 length, and the cumulative sum is 0 | mean | Mathematical average, the average value of 0-length array is NaN |
min, max | Minimum and maximum | argmin, argmax | Location of minimum and maximum values |
cumsum | Element accumulation from 0 to the current element | cumprod | Element accumulation from 1 to the current element |
std, var | Standard deviation and variance can be adjusted by degrees of freedom (the default denominator is n) |
Note: the methods in the table are array methods, which can be called directly through the array.
[example] examples of some statistical methods
In [450]: arr = np.arange(12).reshape(4,3) In [451]: arr Out[451]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]) #Calculate sum In [453]: arr.sum() Out[453]: 36 #Calculate 0 axial In [454]: arr.sum(0) Out[454]: array([18, 22, 26]) #Calculation 1 axial In [455]: arr.sum(1) Out[455]: array([ 3, 12, 21, 30]) In [458]: arr.cumsum(0) Out[458]: array([[ 0, 1, 2], [ 3, 5, 7], [ 9, 12, 15], [18, 22, 26]]) In [459]: arr.cumsum(1) Out[459]: array([[ 0, 1, 3], [ 3, 7, 12], [ 6, 13, 21], [ 9, 19, 30]])
3. Sorting
numpy can use the sort method to sort by location. For multidimensional arrays, you need to pass in the axis value. The sort method can be used in two ways: one is to change the original array, and the other is to create a copy.
arr.sort() //Change the order of the original array elements np.sort(arr) //Generate a sorted copy without changing the order of the original array
[example] use example of sort method
In [467]: arr Out[467]: array([0.19008705, 0.46913444, 0.33954087, 0.7044972 , 0.2260952 ]) In [468]: np.sort(arr) Out[468]: array([0.19008705, 0.2260952 , 0.33954087, 0.46913444, 0.7044972 ]) #After using np.sort, the order of the original array elements remains unchanged In [469]: arr Out[469]: array([0.19008705, 0.46913444, 0.33954087, 0.7044972 , 0.2260952 ]) In [470]: arr.sort() #After using arr.sort, the order of the original array elements is changed In [471]: arr Out[471]: array([0.19008705, 0.2260952 , 0.33954087, 0.46913444, 0.7044972 ])
4. Boolean array processing
numpy also provides processing functions for Boolean arrays. Common operation requirements and functions are as follows.
operation | function |
---|---|
Calculate the number of True | arr.sum(), the Boolean value will be forced to 1(True) or 0 (false), so it can be calculated by summation |
Are all True | arr.all() |
At least one is True | arr.any() |
[example] Boolean array operation example
In [476]: arr = np.random.randint(-10,10,size=(3,3)) In [477]: arr Out[477]: array([[-5, 5, 6], [ 0, -2, -5], [-2, 7, 9]]) In [478]: (arr>0).sum() Out[478]: 4 In [479]: bools = arr>0 In [480]: bools Out[480]: array([[False, True, True], [False, False, False], [False, True, True]]) In [481]: bools.any() Out[481]: True In [482]: bools.all() Out[482]: False
5. Unique value and set operation (operation on one-dimensional array)
numpy contains some basic set operations for one-dimensional ndarray.
method | describe |
---|---|
unique(x) | Calculate the unique value of x and sort |
intersect1d(x,y) | Calculate the intersection of x and y and sort |
union1d(x,y) | Calculate the union of x and y and sort |
in1d(x,y) | Calculates whether the elements in x are included in y and returns an array of Boolean values |
setdiff1d(x,y) | Difference set, the element of X in x but not in y |
setxor1d(x,y) | An exclusive or set, an element in x or y that does not belong to the intersection of x and y |
Note: x and y in the above methods are one-dimensional arrays.
[example] one dimensional array processing example
In [484]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will']) In [485]: np.unique(names) Out[485]: array(['Bob', 'Joe', 'Will'], dtype='<U4') In [486]: values = np.array([6,0,0,3,2,5,6]) In [487]: np.in1d(values, [2,3,6]) Out[487]: array([ True, False, False, True, True, False, True])
6. Array version ternary expression where
The where function is a vectorization implementation of the Python ternary expression x if condition else y. through the where function, we can replace the values in the array through conditional selection or perform other operations. The following example 1 illustrates the specific syntax and example 2 is simple to apply.
[example 1] where syntax example
In [489]: xarr = np.array([1.1,1.2,1.3,1.4,1.5]) In [490]: yarr = np.array([2.1,2.2,2.3,2.4,2.5]) In [491]: cond = np.array([True,False,True,True,False]) In [492]: result = np.where(cond, xarr, yarr) In [493]: result Out[493]: array([1.1, 2.2, 1.3, 1.4, 2.5])
[example 2] where application example
In [495]: arr = np.random.randn(4,4) In [496]: arr Out[496]: array([[ 0.49042258, -0.15916412, 0.50222044, -0.28018826], [ 1.698109 , 0.72016023, -0.19839624, -0.36279838], [ 0.0680141 , 0.74892178, -0.97592457, 0.58013547], [ 0.23533146, 0.01091129, 0.14336871, 1.35089938]]) In [497]: arr > 0 Out[497]: array([[ True, False, True, False], [ True, True, False, False], [ True, True, False, True], [ True, True, True, True]]) In [498]: np.where(arr > 0, 2, -2) Out[498]: array([[ 2, -2, 2, -2], [ 2, 2, -2, -2], [ 2, 2, -2, 2], [ 2, 2, 2, 2]]) In [499]: np.where(arr > 0, 2, arr) Out[499]: array([[ 2. , -0.15916412, 2. , -0.28018826], [ 2. , 2. , -0.19839624, -0.36279838], [ 2. , 2. , -0.97592457, 2. ], [ 2. , 2. , 2. , 2. ]])