# python data analysis - Numpy

Posted by jumpfroggy on Sun, 13 Feb 2022 10:11:47 +0100

catalogue

Common libraries for data analysis:

1.Numpy: basic knowledge

1.1 Numpy built-in method for creating multidimensional array

1.2 index of multidimensional array

1.2.1 intercepting elements in the list

1.2.2} array index

1.2.3 ， from 2D to 3D

1.3 basic operation of multidimensional array

1.3.1 addition, subtraction, multiplication and division are supported

1.3.2 operation between multidimensional arrays

1.3.3 multidimensional array logical operation

1.4 statistical method of multidimensional array

# Common libraries for data analysis:

Third party Library:
Numpy scientific calculation, strong ability to process matrix operation
Pandas - based on Numpy, the function is more powerful, and it is also applied to matrix operation processing
SK learn machine learning algorithm
Matplotlib drawing and making charts

# 1.Numpy: basic knowledge

It provides efficient support for multidimensional arrays and has the following advantages:
The core data structure Ndarray is a multidimensional array that supports vector operation and is continuously stored in memory
Support the operation of various multidimensional arrays
It can be used to provide data interface for other programming languages

Environment: Python Console

pip list shows that numpy exists, but it cannot be used
Install numpy: pip install numpy -i http://pypi.douban.com/simple

Create multidimensional array (matrix). Create a data container through Ndarray object. Subsequent operations are based on this container

```            # Guide package and create array; A1 (one-dimensional), a2 (two-dimensional) two-dimensional imagination is an Excel table with rows and columns
import numpy as np
a1 = np.array([1,2,3,4])
a1
Out[5]: array([1, 2, 3, 4])
type(a1)
Out[6]: numpy.ndarray
a1.shape
Out[7]: (4,)
a1.size
Out[8]: 4
a1.dtype
Out[9]: dtype('int32')
a2 = np.array([[1.0,2.5,3],[0.5,4,9]])
a2.shape
Out[11]: (2, 3) Represents 2 rows and 3 columns
a2.size
Out[12]: 6
a2.min()
Out[13]: 0.5
a2.dtype        Return the data type, the underlying type, and the types you need to know (floating point, integer, string, python Object, Boolean)
Out[14]: dtype('float64')```

## 1.1 Numpy built-in method for creating multidimensional array

Built in method for creating multidimensional array:
np.array - create an array
np.arange creates a one-dimensional array
np.ones # creates an array with element values of 1
np.zeros # create an array with 0 elements
np.empty creates a multidimensional array of null values and allocates memory without passing values
np.random.random creates a multidimensional array whose element values are random values

Demo:

```            import numpy as np
a1 = np.arange(4)
a1
Out[4]: array([0, 1, 2, 3])
a1.ndim
Out[5]: 1
a2 = np.ones((4,4),dtype=np.int64)
a2
Out[7]:
array([[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]], dtype=int64)
a3 = np.zeros((2,2))
a3
Out[9]:
array([[0., 0.],
[0., 0.]])
a3.dtype
Out[10]: dtype('float64')
a3.ndim
Out[11]: 2
a4 = np.empty((3,3),dtype=np.int64)
a4
Out[13]:
array([[                0,                 0,                 0],
[                0,                 0,              1252],
[32088649856188416, 12948256950583296,           7929968]],
dtype=int64)
a4.dtype
Out[14]: dtype('int64')
a4.shape
Out[15]: (3, 3)
a4.ndim
Out[16]: 2
a5 = np.ones((4,3,4))
a5
Out[18]:
array([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]])
a5.ndim
Out[19]: 3

import numpy as np

a1 = np.array([1,2,3,4])
print(a1)```

## 1.2 index of multidimensional array

### 1.2.1 intercepting elements in the list

```            l = [1,2,3,4,5]
l
Out[21]: [1, 2, 3, 4, 5]
l[:2]
Out[22]: [1, 2]
l[2:4]
Out[23]: [3, 4]
l[1:5:2]
Out[24]: [2, 4]```

### 1.2.2} array index

```                import numpy as np
a = np.arange(12)
a
Out[4]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
a[1:4]
Out[5]: array([1, 2, 3])
a[1:10:2]
Out[6]: array([1, 3, 5, 7, 9])```

The slice index is the same as the list, but the array function is more powerful. You can perform slice assignment and change multiple elements of the array at one time

```                a = np.arange(12)
a
Out[8]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
a[1:5] = -1
a
Out[10]: array([ 0, -1, -1, -1, -1,  5,  6,  7,  8,  9, 10, 11])
a[1:10:2] = 1
a
Out[12]: array([ 0,  1, -1,  1, -1,  1,  6,  1,  8,  1, 10, 11])```

Attribute shape: array shape, which can realize the slicing of multidimensional array in each dimension

```                import numpy as np
a = np.arange(12).reshape(3,4)
a
Out[4]:
array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
a.shape\

Out[5]: (3, 4)
a[0]
Out[6]: array([0, 1, 2, 3])
a[1]
Out[7]: array([4, 5, 6, 7])
a[:,0]
Out[8]: array([0, 4, 8])
a[:,1]
Out[9]: array([1, 5, 9])
a[:,2]
Out[10]: array([ 2,  6, 10])
a[0,0]
Out[11]: 0
a[0,1]
Out[12]: 1
a[1,1]
Out[13]: 5
a[1,2]
Out[14]: 6
a[0] = 1
a
Out[16]:
array([[ 1,  1,  1,  1],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
a[:,1] = -1
a
Out[18]:
array([[ 1, -1,  1,  1],
[ 4, -1,  6,  7],
[ 8, -1, 10, 11]])```

### 1.2.3 ， from 2D to 3D

a[x,y]
a[x,y,z]

Create a three-dimensional array with an array length of 27, 3, 3, 3

```                    a = np.arange(27).reshape(3,3,3)
a
Out[20]:
array([[[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
a.ndim
Out[21]: 3
a.shape
Out[22]: (3, 3, 3)
a1 = a[1]
a1
Out[24]:
array([[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
a1.shape
Out[25]: (3, 3)
a2 = a[1,1]
a2
Out[27]: array([12, 13, 14])
a2.shape
Out[28]: (3,)
a2.ndim
Out[29]: 1
a[:,1]
Out[30]:
array([[ 3,  4,  5],
[12, 13, 14],
[21, 22, 23]])
a[:,1] = 1
a
Out[32]:
array([[[ 0,  1,  2],
[ 1,  1,  1],
[ 6,  7,  8]],
[[ 9, 10, 11],
[ 1,  1,  1],
[15, 16, 17]],
[[18, 19, 20],
[ 1,  1,  1],
[24, 25, 26]]])```

## 1.3 basic operation of multidimensional array

### 1.3.1 addition, subtraction, multiplication and division are supported

```                import numpy as np
a = np.arange(12).reshape(3,4)
a
Out[4]:
array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
a += 1
a
Out[6]:
array([[ 1,  2,  3,  4],
[ 5,  6,  7,  8],
[ 9, 10, 11, 12]])
a *= 2
a
Out[8]:
array([[ 2,  4,  6,  8],
[10, 12, 14, 16],
[18, 20, 22, 24]])```

### 1.3.2 operation between multidimensional arrays

```                import numpy as np
a = np.arange(4).reshape(2,2)
b = np.arange(4,8).reshape(2,2)
a
Out[5]:
array([[0, 1],
[2, 3]])
b
Out[6]:
array([[4, 5],
[6, 7]])
b - a
Out[7]:
array([[4, 4],
[4, 4]])
a + b
Out[8]:
array([[ 4,  6],
[ 8, 10]])
a * b
Out[9]:
array([[ 0,  5],
[12, 21]])```

The method called by the matrix multiplication rule is ndarray dot
a.dot(b)
Out[10]:
array([[ 6,  7],
[26, 31]])

### 1.3.3 multidimensional array logical operation

```                a = np.arange(12).reshape(4,3)
b = a > 5
b
Out[13]:
array([[False, False, False],
[False, False, False],
[ True,  True,  True],
[ True,  True,  True]])
a[b]
Out[14]: array([ 6,  7,  8,  9, 10, 11])```

## 1.4 statistical method of multidimensional array

a as multidimensional array

a.sum ， calculate the sum of all elements of multidimensional array
a.max ﹐ maximum calculation
Calculation of minimum value of a.min
a.mean calculation
a.std # standard deviation calculation: the square root of the average of the square of the difference between the standard value and the average value of each unit in the whole
a.var variance calculation: the average of the square value of the difference between each sample value and the average of all sample values

The parameter axis can be added to the above methods to specify which coordinate axis to count the data. For example, two-dimensional: X and Y axes represent rows and columns respectively
Assume that axis = 0 represents the statistical column and axis=1 represents the statistical row

```                    a = np.arange(12).reshape(4,3)
a
Out[15]:
array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11]])
a.sum()
Out[16]: 66
a.sum(axis=0)
Out[17]: array([18, 22, 26])
a.sum(axis=1)
Out[18]: array([ 3, 12, 21, 30])```

General functions:
np.sqrt square root
np.dot matrix multiplication
np.sort
np.linalg # contains basic linear functions

Topics: Python Data Analysis