catalogue
Common libraries for data analysis:
1.1 Numpy built-in method for creating multidimensional array
1.2 index of multidimensional array
1.2.1 intercepting elements in the list
1.3 basic operation of multidimensional array
1.3.1 addition, subtraction, multiplication and division are supported
1.3.2 operation between multidimensional arrays
1.3.3 multidimensional array logical operation
1.4 statistical method of multidimensional array
Common libraries for data analysis:
Third party Library:
Numpy scientific calculation, strong ability to process matrix operation
Pandas - based on Numpy, the function is more powerful, and it is also applied to matrix operation processing
SK learn machine learning algorithm
Matplotlib drawing and making charts
1.Numpy: basic knowledge
It provides efficient support for multidimensional arrays and has the following advantages:
The core data structure Ndarray is a multidimensional array that supports vector operation and is continuously stored in memory
Support the operation of various multidimensional arrays
It can be used to provide data interface for other programming languages
Environment: Python Console
pip list shows that numpy exists, but it cannot be used
Install numpy: pip install numpy -i http://pypi.douban.com/simple
Create multidimensional array (matrix). Create a data container through Ndarray object. Subsequent operations are based on this container
# Guide package and create array; A1 (one-dimensional), a2 (two-dimensional) two-dimensional imagination is an Excel table with rows and columns import numpy as np a1 = np.array([1,2,3,4]) a1 Out[5]: array([1, 2, 3, 4]) type(a1) Out[6]: numpy.ndarray a1.shape Out[7]: (4,) a1.size Out[8]: 4 a1.dtype Out[9]: dtype('int32') a2 = np.array([[1.0,2.5,3],[0.5,4,9]]) a2.shape Out[11]: (2, 3) Represents 2 rows and 3 columns a2.size Out[12]: 6 a2.min() Out[13]: 0.5 a2.dtype Return the data type, the underlying type, and the types you need to know (floating point, integer, string, python Object, Boolean) Out[14]: dtype('float64')
1.1 Numpy built-in method for creating multidimensional array
Built in method for creating multidimensional array:
np.array - create an array
np.arange creates a one-dimensional array
np.ones # creates an array with element values of 1
np.zeros # create an array with 0 elements
np.empty creates a multidimensional array of null values and allocates memory without passing values
np.random.random creates a multidimensional array whose element values are random values
Demo:
import numpy as np a1 = np.arange(4) a1 Out[4]: array([0, 1, 2, 3]) a1.ndim Out[5]: 1 a2 = np.ones((4,4),dtype=np.int64) a2 Out[7]: array([[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]], dtype=int64) a3 = np.zeros((2,2)) a3 Out[9]: array([[0., 0.], [0., 0.]]) a3.dtype Out[10]: dtype('float64') a3.ndim Out[11]: 2 a4 = np.empty((3,3),dtype=np.int64) a4 Out[13]: array([[ 0, 0, 0], [ 0, 0, 1252], [32088649856188416, 12948256950583296, 7929968]], dtype=int64) a4.dtype Out[14]: dtype('int64') a4.shape Out[15]: (3, 3) a4.ndim Out[16]: 2 a5 = np.ones((4,3,4)) a5 Out[18]: array([[[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]], [[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]], [[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]], [[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]]]) a5.ndim Out[19]: 3 import numpy as np a1 = np.array([1,2,3,4]) print(a1)
1.2 index of multidimensional array
1.2.1 intercepting elements in the list
l = [1,2,3,4,5] l Out[21]: [1, 2, 3, 4, 5] l[:2] Out[22]: [1, 2] l[2:4] Out[23]: [3, 4] l[1:5:2] Out[24]: [2, 4]
1.2.2} array index
import numpy as np a = np.arange(12) a Out[4]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) a[1:4] Out[5]: array([1, 2, 3]) a[1:10:2] Out[6]: array([1, 3, 5, 7, 9])
The slice index is the same as the list, but the array function is more powerful. You can perform slice assignment and change multiple elements of the array at one time
a = np.arange(12) a Out[8]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) a[1:5] = -1 a Out[10]: array([ 0, -1, -1, -1, -1, 5, 6, 7, 8, 9, 10, 11]) a[1:10:2] = 1 a Out[12]: array([ 0, 1, -1, 1, -1, 1, 6, 1, 8, 1, 10, 11])
Attribute shape: array shape, which can realize the slicing of multidimensional array in each dimension
import numpy as np a = np.arange(12).reshape(3,4) a Out[4]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) a.shape\ Out[5]: (3, 4) a[0] Out[6]: array([0, 1, 2, 3]) a[1] Out[7]: array([4, 5, 6, 7]) a[:,0] Out[8]: array([0, 4, 8]) a[:,1] Out[9]: array([1, 5, 9]) a[:,2] Out[10]: array([ 2, 6, 10]) a[0,0] Out[11]: 0 a[0,1] Out[12]: 1 a[1,1] Out[13]: 5 a[1,2] Out[14]: 6 a[0] = 1 a Out[16]: array([[ 1, 1, 1, 1], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) a[:,1] = -1 a Out[18]: array([[ 1, -1, 1, 1], [ 4, -1, 6, 7], [ 8, -1, 10, 11]])
1.2.3 , from 2D to 3D
a[x,y]
a[x,y,z]
Create a three-dimensional array with an array length of 27, 3, 3, 3
a = np.arange(27).reshape(3,3,3) a Out[20]: array([[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8]], [[ 9, 10, 11], [12, 13, 14], [15, 16, 17]], [[18, 19, 20], [21, 22, 23], [24, 25, 26]]]) a.ndim Out[21]: 3 a.shape Out[22]: (3, 3, 3) a1 = a[1] a1 Out[24]: array([[ 9, 10, 11], [12, 13, 14], [15, 16, 17]]) a1.shape Out[25]: (3, 3) a2 = a[1,1] a2 Out[27]: array([12, 13, 14]) a2.shape Out[28]: (3,) a2.ndim Out[29]: 1 a[:,1] Out[30]: array([[ 3, 4, 5], [12, 13, 14], [21, 22, 23]]) a[:,1] = 1 a Out[32]: array([[[ 0, 1, 2], [ 1, 1, 1], [ 6, 7, 8]], [[ 9, 10, 11], [ 1, 1, 1], [15, 16, 17]], [[18, 19, 20], [ 1, 1, 1], [24, 25, 26]]])
1.3 basic operation of multidimensional array
1.3.1 addition, subtraction, multiplication and division are supported
import numpy as np a = np.arange(12).reshape(3,4) a Out[4]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) a += 1 a Out[6]: array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]]) a *= 2 a Out[8]: array([[ 2, 4, 6, 8], [10, 12, 14, 16], [18, 20, 22, 24]])
1.3.2 operation between multidimensional arrays
import numpy as np a = np.arange(4).reshape(2,2) b = np.arange(4,8).reshape(2,2) a Out[5]: array([[0, 1], [2, 3]]) b Out[6]: array([[4, 5], [6, 7]]) b - a Out[7]: array([[4, 4], [4, 4]]) a + b Out[8]: array([[ 4, 6], [ 8, 10]]) a * b Out[9]: array([[ 0, 5], [12, 21]])
The method called by the matrix multiplication rule is ndarray dot
a.dot(b)
Out[10]:
array([[ 6, 7],
[26, 31]])
1.3.3 multidimensional array logical operation
a = np.arange(12).reshape(4,3) b = a > 5 b Out[13]: array([[False, False, False], [False, False, False], [ True, True, True], [ True, True, True]]) a[b] Out[14]: array([ 6, 7, 8, 9, 10, 11])
1.4 statistical method of multidimensional array
a as multidimensional array
a.sum , calculate the sum of all elements of multidimensional array
a.max ﹐ maximum calculation
Calculation of minimum value of a.min
a.mean calculation
a.std # standard deviation calculation: the square root of the average of the square of the difference between the standard value and the average value of each unit in the whole
a.var variance calculation: the average of the square value of the difference between each sample value and the average of all sample values
The parameter axis can be added to the above methods to specify which coordinate axis to count the data. For example, two-dimensional: X and Y axes represent rows and columns respectively
Assume that axis = 0 represents the statistical column and axis=1 represents the statistical row
a = np.arange(12).reshape(4,3) a Out[15]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]) a.sum() Out[16]: 66 a.sum(axis=0) Out[17]: array([18, 22, 26]) a.sum(axis=1) Out[18]: array([ 3, 12, 21, 30])
General functions:
np.sqrt square root
np.dot matrix multiplication
np.sort
np.linalg # contains basic linear functions