python data analysis - Numpy

Posted by jumpfroggy on Sun, 13 Feb 2022 10:11:47 +0100

catalogue

Common libraries for data analysis:

1.Numpy: basic knowledge

1.1 Numpy built-in method for creating multidimensional array

1.2 index of multidimensional array

1.2.1 intercepting elements in the list

1.2.2} array index

1.2.3 , from 2D to 3D

1.3 basic operation of multidimensional array

1.3.1 addition, subtraction, multiplication and division are supported

1.3.2 operation between multidimensional arrays

1.3.3 multidimensional array logical operation

1.4 statistical method of multidimensional array

Common libraries for data analysis:


Third party Library:
Numpy scientific calculation, strong ability to process matrix operation
Pandas - based on Numpy, the function is more powerful, and it is also applied to matrix operation processing
SK learn machine learning algorithm
Matplotlib drawing and making charts

 

1.Numpy: basic knowledge


It provides efficient support for multidimensional arrays and has the following advantages:
The core data structure Ndarray is a multidimensional array that supports vector operation and is continuously stored in memory
Support the operation of various multidimensional arrays
It can be used to provide data interface for other programming languages
        
Environment: Python Console

pip list shows that numpy exists, but it cannot be used
Install numpy: pip install numpy -i http://pypi.douban.com/simple
        
Create multidimensional array (matrix). Create a data container through Ndarray object. Subsequent operations are based on this container

            # Guide package and create array; A1 (one-dimensional), a2 (two-dimensional) two-dimensional imagination is an Excel table with rows and columns            
            import numpy as np
            a1 = np.array([1,2,3,4])
            a1
            Out[5]: array([1, 2, 3, 4])
            type(a1)
            Out[6]: numpy.ndarray
            a1.shape
            Out[7]: (4,)
            a1.size
            Out[8]: 4
            a1.dtype
            Out[9]: dtype('int32')
            a2 = np.array([[1.0,2.5,3],[0.5,4,9]])
            a2.shape
            Out[11]: (2, 3) Represents 2 rows and 3 columns
            a2.size
            Out[12]: 6
            a2.min()
            Out[13]: 0.5
            a2.dtype        Return the data type, the underlying type, and the types you need to know (floating point, integer, string, python Object, Boolean)
            Out[14]: dtype('float64')

 

1.1 Numpy built-in method for creating multidimensional array

Built in method for creating multidimensional array:
            np.array - create an array
            np.arange creates a one-dimensional array
            np.ones # creates an array with element values of 1
            np.zeros # create an array with 0 elements
            np.empty creates a multidimensional array of null values and allocates memory without passing values
            np.random.random creates a multidimensional array whose element values are random values

Demo:

            import numpy as np
            a1 = np.arange(4)
            a1
            Out[4]: array([0, 1, 2, 3])
            a1.ndim
            Out[5]: 1
            a2 = np.ones((4,4),dtype=np.int64)
            a2
            Out[7]: 
            array([[1, 1, 1, 1],
                   [1, 1, 1, 1],
                   [1, 1, 1, 1],
                   [1, 1, 1, 1]], dtype=int64)
            a3 = np.zeros((2,2))
            a3
            Out[9]: 
            array([[0., 0.],
                   [0., 0.]])
            a3.dtype
            Out[10]: dtype('float64')
            a3.ndim
            Out[11]: 2
            a4 = np.empty((3,3),dtype=np.int64)
            a4
            Out[13]: 
            array([[                0,                 0,                 0],
                   [                0,                 0,              1252],
                   [32088649856188416, 12948256950583296,           7929968]],
                  dtype=int64)
            a4.dtype
            Out[14]: dtype('int64')
            a4.shape
            Out[15]: (3, 3)
            a4.ndim
            Out[16]: 2
            a5 = np.ones((4,3,4))
            a5
            Out[18]: 
            array([[[1., 1., 1., 1.],
                    [1., 1., 1., 1.],
                    [1., 1., 1., 1.]],
                   [[1., 1., 1., 1.],
                    [1., 1., 1., 1.],
                    [1., 1., 1., 1.]],
                   [[1., 1., 1., 1.],
                    [1., 1., 1., 1.],
                    [1., 1., 1., 1.]],
                   [[1., 1., 1., 1.],
                    [1., 1., 1., 1.],
                    [1., 1., 1., 1.]]])
            a5.ndim
            Out[19]: 3

            import numpy as np
            
            a1 = np.array([1,2,3,4])
            print(a1)

1.2 index of multidimensional array

1.2.1 intercepting elements in the list

            l = [1,2,3,4,5]
            l
            Out[21]: [1, 2, 3, 4, 5]
            l[:2]
            Out[22]: [1, 2]
            l[2:4]
            Out[23]: [3, 4]
            l[1:5:2]
            Out[24]: [2, 4]

1.2.2} array index

                import numpy as np
                a = np.arange(12)
                a
                Out[4]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
                a[1:4]
                Out[5]: array([1, 2, 3])
                a[1:10:2]
                Out[6]: array([1, 3, 5, 7, 9])

The slice index is the same as the list, but the array function is more powerful. You can perform slice assignment and change multiple elements of the array at one time

                a = np.arange(12)
                a
                Out[8]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
                a[1:5] = -1
                a
                Out[10]: array([ 0, -1, -1, -1, -1,  5,  6,  7,  8,  9, 10, 11])
                a[1:10:2] = 1
                a
                Out[12]: array([ 0,  1, -1,  1, -1,  1,  6,  1,  8,  1, 10, 11])

Attribute shape: array shape, which can realize the slicing of multidimensional array in each dimension

                import numpy as np
                a = np.arange(12).reshape(3,4)
                a
                Out[4]: 
                array([[ 0,  1,  2,  3],
                       [ 4,  5,  6,  7],
                       [ 8,  9, 10, 11]])
                a.shape\
                    
                Out[5]: (3, 4)
                a[0]
                Out[6]: array([0, 1, 2, 3])
                a[1]
                Out[7]: array([4, 5, 6, 7])
                a[:,0]
                Out[8]: array([0, 4, 8])
                a[:,1]
                Out[9]: array([1, 5, 9])
                a[:,2]
                Out[10]: array([ 2,  6, 10])
                a[0,0]
                Out[11]: 0
                a[0,1]
                Out[12]: 1
                a[1,1]
                Out[13]: 5
                a[1,2]
                Out[14]: 6
                a[0] = 1
                a
                Out[16]: 
                array([[ 1,  1,  1,  1],
                       [ 4,  5,  6,  7],
                       [ 8,  9, 10, 11]])
                a[:,1] = -1
                a
                Out[18]: 
                array([[ 1, -1,  1,  1],
                       [ 4, -1,  6,  7],
                       [ 8, -1, 10, 11]])

1.2.3 , from 2D to 3D


                a[x,y]
                a[x,y,z]
                
Create a three-dimensional array with an array length of 27, 3, 3, 3

                    a = np.arange(27).reshape(3,3,3)
                    a
                    Out[20]: 
                    array([[[ 0,  1,  2],
                            [ 3,  4,  5],
                            [ 6,  7,  8]],
                           [[ 9, 10, 11],
                            [12, 13, 14],
                            [15, 16, 17]],
                           [[18, 19, 20],
                            [21, 22, 23],
                            [24, 25, 26]]])
                    a.ndim
                    Out[21]: 3
                    a.shape
                    Out[22]: (3, 3, 3)
                    a1 = a[1]
                    a1
                    Out[24]: 
                    array([[ 9, 10, 11],
                           [12, 13, 14],
                           [15, 16, 17]])
                    a1.shape
                    Out[25]: (3, 3)
                    a2 = a[1,1]
                    a2
                    Out[27]: array([12, 13, 14])
                    a2.shape
                    Out[28]: (3,)
                    a2.ndim
                    Out[29]: 1
                    a[:,1]
                    Out[30]: 
                    array([[ 3,  4,  5],
                           [12, 13, 14],
                           [21, 22, 23]])
                    a[:,1] = 1
                    a
                    Out[32]: 
                    array([[[ 0,  1,  2],
                            [ 1,  1,  1],
                            [ 6,  7,  8]],
                           [[ 9, 10, 11],
                            [ 1,  1,  1],
                            [15, 16, 17]],
                           [[18, 19, 20],
                            [ 1,  1,  1],
                            [24, 25, 26]]])

 

1.3 basic operation of multidimensional array

1.3.1 addition, subtraction, multiplication and division are supported

                import numpy as np
                a = np.arange(12).reshape(3,4)
                a
                Out[4]: 
                array([[ 0,  1,  2,  3],
                       [ 4,  5,  6,  7],
                       [ 8,  9, 10, 11]])
                a += 1
                a
                Out[6]: 
                array([[ 1,  2,  3,  4],
                       [ 5,  6,  7,  8],
                       [ 9, 10, 11, 12]])
                a *= 2
                a
                Out[8]: 
                array([[ 2,  4,  6,  8],
                       [10, 12, 14, 16],
                       [18, 20, 22, 24]])

 

1.3.2 operation between multidimensional arrays

                import numpy as np
                a = np.arange(4).reshape(2,2)
                b = np.arange(4,8).reshape(2,2)
                a
                Out[5]: 
                array([[0, 1],
                       [2, 3]])
                b
                Out[6]: 
                array([[4, 5],
                       [6, 7]])
                b - a
                Out[7]: 
                array([[4, 4],
                       [4, 4]])
                a + b
                Out[8]: 
                array([[ 4,  6],
                       [ 8, 10]])
                a * b
                Out[9]: 
                array([[ 0,  5],
                       [12, 21]])

The method called by the matrix multiplication rule is ndarray dot           
                a.dot(b)
                Out[10]: 
                array([[ 6,  7],
                       [26, 31]])

 

1.3.3 multidimensional array logical operation

                a = np.arange(12).reshape(4,3)
                b = a > 5
                b
                Out[13]: 
                array([[False, False, False],
                       [False, False, False],
                       [ True,  True,  True],
                       [ True,  True,  True]])
                a[b]
                Out[14]: array([ 6,  7,  8,  9, 10, 11])

 

1.4 statistical method of multidimensional array

a as multidimensional array
                
a.sum , calculate the sum of all elements of multidimensional array
a.max ﹐ maximum calculation
Calculation of minimum value of a.min
a.mean calculation
a.std # standard deviation calculation: the square root of the average of the square of the difference between the standard value and the average value of each unit in the whole
a.var variance calculation: the average of the square value of the difference between each sample value and the average of all sample values
                
The parameter axis can be added to the above methods to specify which coordinate axis to count the data. For example, two-dimensional: X and Y axes represent rows and columns respectively
Assume that axis = 0 represents the statistical column and axis=1 represents the statistical row

                    a = np.arange(12).reshape(4,3)
                    a
                    Out[15]: 
                    array([[ 0,  1,  2],
                           [ 3,  4,  5],
                           [ 6,  7,  8],
                           [ 9, 10, 11]])
                    a.sum()
                    Out[16]: 66
                    a.sum(axis=0)
                    Out[17]: array([18, 22, 26])
                    a.sum(axis=1)
                    Out[18]: array([ 3, 12, 21, 30])

General functions:
                np.sqrt square root
                np.dot matrix multiplication
                np.sort
                np.linalg # contains basic linear functions

Topics: Python Data Analysis