python data analysis - Numpy

Posted by jumpfroggy on Sun, 13 Feb 2022 10:11:47 +0100

catalogue

Common libraries for data analysis:

1.Numpy: basic knowledge

1.1 Numpy built-in method for creating multidimensional array

1.2 index of multidimensional array

1.2.1 intercepting elements in the list

1.2.2} array index

1.2.3 ， from 2D to 3D

1.3 basic operation of multidimensional array

1.3.1 addition, subtraction, multiplication and division are supported

1.3.2 operation between multidimensional arrays

1.3.3 multidimensional array logical operation

1.4 statistical method of multidimensional array

Common libraries for data analysis:

Third party Library:
Numpy scientific calculation, strong ability to process matrix operation
Pandas - based on Numpy, the function is more powerful, and it is also applied to matrix operation processing
SK learn machine learning algorithm
Matplotlib drawing and making charts

1.Numpy: basic knowledge

It provides efficient support for multidimensional arrays and has the following advantages:
The core data structure Ndarray is a multidimensional array that supports vector operation and is continuously stored in memory
Support the operation of various multidimensional arrays
It can be used to provide data interface for other programming languages

Environment: Python Console

pip list shows that numpy exists, but it cannot be used
Install numpy: pip install numpy -i http://pypi.douban.com/simple

Create multidimensional array (matrix). Create a data container through Ndarray object. Subsequent operations are based on this container

            # Guide package and create array; A1 (one-dimensional), a2 (two-dimensional) two-dimensional imagination is an Excel table with rows and columns            
            import numpy as np
            a1 = np.array([1,2,3,4])
            a1
            Out[5]: array([1, 2, 3, 4])
            type(a1)
            Out[6]: numpy.ndarray
            a1.shape
            Out[7]: (4,)
            a1.size
            Out[8]: 4
            a1.dtype
            Out[9]: dtype('int32')
            a2 = np.array([[1.0,2.5,3],[0.5,4,9]])
            a2.shape
            Out[11]: (2, 3) Represents 2 rows and 3 columns
            a2.size
            Out[12]: 6
            a2.min()
            Out[13]: 0.5
            a2.dtype        Return the data type, the underlying type, and the types you need to know (floating point, integer, string, python Object, Boolean)
            Out[14]: dtype('float64')

1.1 Numpy built-in method for creating multidimensional array

Built in method for creating multidimensional array:
np.array - create an array
np.arange creates a one-dimensional array
np.ones # creates an array with element values of 1
np.zeros # create an array with 0 elements
np.empty creates a multidimensional array of null values and allocates memory without passing values
np.random.random creates a multidimensional array whose element values are random values

Demo:

            import numpy as np
            a1 = np.arange(4)
            a1
            Out[4]: array([0, 1, 2, 3])
            a1.ndim
            Out[5]: 1
            a2 = np.ones((4,4),dtype=np.int64)
            a2
            Out[7]: 
            array([[1, 1, 1, 1],
                   [1, 1, 1, 1],
                   [1, 1, 1, 1],
                   [1, 1, 1, 1]], dtype=int64)
            a3 = np.zeros((2,2))
            a3
            Out[9]: 
            array([[0., 0.],
                   [0., 0.]])
            a3.dtype
            Out[10]: dtype('float64')
            a3.ndim
            Out[11]: 2
            a4 = np.empty((3,3),dtype=np.int64)
            a4
            Out[13]: 
            array([[                0,                 0,                 0],
                   [                0,                 0,              1252],
                   [32088649856188416, 12948256950583296,           7929968]],
                  dtype=int64)
            a4.dtype
            Out[14]: dtype('int64')
            a4.shape
            Out[15]: (3, 3)
            a4.ndim
            Out[16]: 2
            a5 = np.ones((4,3,4))
            a5
            Out[18]: 
            array([[[1., 1., 1., 1.],
                    [1., 1., 1., 1.],
                    [1., 1., 1., 1.]],
                   [[1., 1., 1., 1.],
                    [1., 1., 1., 1.],
                    [1., 1., 1., 1.]],
                   [[1., 1., 1., 1.],
                    [1., 1., 1., 1.],
                    [1., 1., 1., 1.]],
                   [[1., 1., 1., 1.],
                    [1., 1., 1., 1.],
                    [1., 1., 1., 1.]]])
            a5.ndim
            Out[19]: 3

            import numpy as np
            
            a1 = np.array([1,2,3,4])
            print(a1)

1.2 index of multidimensional array

1.2.1 intercepting elements in the list

            l = [1,2,3,4,5]
            l
            Out[21]: [1, 2, 3, 4, 5]
            l[:2]
            Out[22]: [1, 2]
            l[2:4]
            Out[23]: [3, 4]
            l[1:5:2]
            Out[24]: [2, 4]

1.2.2} array index

                import numpy as np
                a = np.arange(12)
                a
                Out[4]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
                a[1:4]
                Out[5]: array([1, 2, 3])
                a[1:10:2]
                Out[6]: array([1, 3, 5, 7, 9])

The slice index is the same as the list, but the array function is more powerful. You can perform slice assignment and change multiple elements of the array at one time

                a = np.arange(12)
                a
                Out[8]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
                a[1:5] = -1
                a
                Out[10]: array([ 0, -1, -1, -1, -1,  5,  6,  7,  8,  9, 10, 11])
                a[1:10:2] = 1
                a
                Out[12]: array([ 0,  1, -1,  1, -1,  1,  6,  1,  8,  1, 10, 11])

Attribute shape: array shape, which can realize the slicing of multidimensional array in each dimension

                import numpy as np
                a = np.arange(12).reshape(3,4)
                a
                Out[4]: 
                array([[ 0,  1,  2,  3],
                       [ 4,  5,  6,  7],
                       [ 8,  9, 10, 11]])
                a.shape\
                    
                Out[5]: (3, 4)
                a[0]
                Out[6]: array([0, 1, 2, 3])
                a[1]
                Out[7]: array([4, 5, 6, 7])
                a[:,0]
                Out[8]: array([0, 4, 8])
                a[:,1]
                Out[9]: array([1, 5, 9])
                a[:,2]
                Out[10]: array([ 2,  6, 10])
                a[0,0]
                Out[11]: 0
                a[0,1]
                Out[12]: 1
                a[1,1]
                Out[13]: 5
                a[1,2]
                Out[14]: 6
                a[0] = 1
                a
                Out[16]: 
                array([[ 1,  1,  1,  1],
                       [ 4,  5,  6,  7],
                       [ 8,  9, 10, 11]])
                a[:,1] = -1
                a
                Out[18]: 
                array([[ 1, -1,  1,  1],
                       [ 4, -1,  6,  7],
                       [ 8, -1, 10, 11]])

1.2.3 ， from 2D to 3D

a[x,y]
a[x,y,z]

Create a three-dimensional array with an array length of 27, 3, 3, 3

                    a = np.arange(27).reshape(3,3,3)
                    a
                    Out[20]: 
                    array([[[ 0,  1,  2],
                            [ 3,  4,  5],
                            [ 6,  7,  8]],
                           [[ 9, 10, 11],
                            [12, 13, 14],
                            [15, 16, 17]],
                           [[18, 19, 20],
                            [21, 22, 23],
                            [24, 25, 26]]])
                    a.ndim
                    Out[21]: 3
                    a.shape
                    Out[22]: (3, 3, 3)
                    a1 = a[1]
                    a1
                    Out[24]: 
                    array([[ 9, 10, 11],
                           [12, 13, 14],
                           [15, 16, 17]])
                    a1.shape
                    Out[25]: (3, 3)
                    a2 = a[1,1]
                    a2
                    Out[27]: array([12, 13, 14])
                    a2.shape
                    Out[28]: (3,)
                    a2.ndim
                    Out[29]: 1
                    a[:,1]
                    Out[30]: 
                    array([[ 3,  4,  5],
                           [12, 13, 14],
                           [21, 22, 23]])
                    a[:,1] = 1
                    a
                    Out[32]: 
                    array([[[ 0,  1,  2],
                            [ 1,  1,  1],
                            [ 6,  7,  8]],
                           [[ 9, 10, 11],
                            [ 1,  1,  1],
                            [15, 16, 17]],
                           [[18, 19, 20],
                            [ 1,  1,  1],
                            [24, 25, 26]]])

1.3 basic operation of multidimensional array

1.3.1 addition, subtraction, multiplication and division are supported

                import numpy as np
                a = np.arange(12).reshape(3,4)
                a
                Out[4]: 
                array([[ 0,  1,  2,  3],
                       [ 4,  5,  6,  7],
                       [ 8,  9, 10, 11]])
                a += 1
                a
                Out[6]: 
                array([[ 1,  2,  3,  4],
                       [ 5,  6,  7,  8],
                       [ 9, 10, 11, 12]])
                a *= 2
                a
                Out[8]: 
                array([[ 2,  4,  6,  8],
                       [10, 12, 14, 16],
                       [18, 20, 22, 24]])

1.3.2 operation between multidimensional arrays

                import numpy as np
                a = np.arange(4).reshape(2,2)
                b = np.arange(4,8).reshape(2,2)
                a
                Out[5]: 
                array([[0, 1],
                       [2, 3]])
                b
                Out[6]: 
                array([[4, 5],
                       [6, 7]])
                b - a
                Out[7]: 
                array([[4, 4],
                       [4, 4]])
                a + b
                Out[8]: 
                array([[ 4,  6],
                       [ 8, 10]])
                a * b
                Out[9]: 
                array([[ 0,  5],
                       [12, 21]])

The method called by the matrix multiplication rule is ndarray dot
a.dot(b)
Out[10]:
array([[ 6, 7],
[26, 31]])

1.3.3 multidimensional array logical operation

                a = np.arange(12).reshape(4,3)
                b = a > 5
                b
                Out[13]: 
                array([[False, False, False],
                       [False, False, False],
                       [ True,  True,  True],
                       [ True,  True,  True]])
                a[b]
                Out[14]: array([ 6,  7,  8,  9, 10, 11])

1.4 statistical method of multidimensional array

a as multidimensional array

a.sum ， calculate the sum of all elements of multidimensional array
a.max ﹐ maximum calculation
Calculation of minimum value of a.min
a.mean calculation
a.std # standard deviation calculation: the square root of the average of the square of the difference between the standard value and the average value of each unit in the whole
a.var variance calculation: the average of the square value of the difference between each sample value and the average of all sample values

The parameter axis can be added to the above methods to specify which coordinate axis to count the data. For example, two-dimensional: X and Y axes represent rows and columns respectively
Assume that axis = 0 represents the statistical column and axis=1 represents the statistical row

                    a = np.arange(12).reshape(4,3)
                    a
                    Out[15]: 
                    array([[ 0,  1,  2],
                           [ 3,  4,  5],
                           [ 6,  7,  8],
                           [ 9, 10, 11]])
                    a.sum()
                    Out[16]: 66
                    a.sum(axis=0)
                    Out[17]: array([18, 22, 26])
                    a.sum(axis=1)
                    Out[18]: array([ 3, 12, 21, 30])

General functions:
np.sqrt square root
np.dot matrix multiplication
np.sort
np.linalg # contains basic linear functions

Topics: Python Data Analysis

Programmer Think

python data analysis - Numpy

Common libraries for data analysis:

1.Numpy: basic knowledge

1.1 Numpy built-in method for creating multidimensional array

1.2 index of multidimensional array

1.2.1 intercepting elements in the list

1.2.2} array index

1.2.3 ， from 2D to 3D

1.3 basic operation of multidimensional array

1.3.1 addition, subtraction, multiplication and division are supported

1.3.2 operation between multidimensional arrays

1.3.3 multidimensional array logical operation

1.4 statistical method of multidimensional array

Hot Topics