I. basic usage of numpy module:
numpy module can process data efficiently and provide array support. Many modules rely on it, such as panda, scipy, matplotlib, etc., so this module is the foundation.
(1) import:
import numpy
(2) create one dimension and two dimension arrays:
#Create a one-dimensional array x=numpy.array(["1","3","r","u","a"]) #Create a 2D array y=numpy.array([[1,2],[22,2],[11,8]])
Results:
>>> x array(['1', '3', 'a', 'r', 'u'], dtype='<U1') >>> y array([[ 1, 2], [ 2, 22], [ 8, 11]])
(3) extract array specific values:
#Create a one-dimensional array x=numpy.array(["1","3","r","u","a"]) #Create a 2D array y=numpy.array([[1,2],[22,2],[11,8]]) #Output the first element of a one-dimensional array print(x[0]) #Output the first element of the second index of the 2D array print(y[2][0])
Results:
1 11
(4) maximum and minimum value of array:
#Take maximum and minimum y1=y.max()#Maximum of all elements y2=y.min()#Minimum of all elements
Results:
>>> y1 22 >>> y2 1
(5) array element sorting:
#Create a one-dimensional array x=numpy.array(["1","3","r","u","a"]) #Create a 2D array y=numpy.array([[1,2],[22,2],[11,8]]) #sort x.sort() y.sort()#Sort each one in two dimensions
Results:
>>> x array(['1', '3', 'a', 'r', 'u'], dtype='<U1') >>> y array([[ 1, 2], [ 2, 22], [ 8, 11]])
(6) slice: take a fragment element by subscript
#Create a one-dimensional array x=numpy.array(["1","3","r","u","a"]) #Slice: take a fragment element by subscript #Format: array [start subscript: final subscript + 1] x[1:3]#"3","r" x[:3]#"1","3","r" x[1:]#"3","r","u","a"
II. Basic usage of pandas module:
The pandas module is mainly used for data exploration and data analysis.
(1) import
import pandas as pda #After that, pda can be used to replace pandas in the code, which is convenient
(2) create data:
Series: represents a string of numbers, row by column, and its index.
DataFrame: a data frame, similar to a table, which represents the data of row and column integration. columns are used to represent its header.
1) create as an array:
#Create data by array a=pda.Series([8,9,2,1]) b=pda.Series([8,9,2,1],index=["a","b","c","d"]) c=pda.DataFrame([[5,8,9,6],[3,5,7,9],[33,54,58,10],[2,12,55,78]]) d=pda.DataFrame([[5,8,9,6],[3,5,7,9],[33,54,58,10],[2,12,55,78]],columns=["one","two","three","four"])
Results:
>>> a 0 8 1 9 2 2 3 1 dtype: int64 >>> b a 8 b 9 c 2 d 1 dtype: int64 >>> c 0 1 2 3 0 5 8 9 6 1 3 5 7 9 2 33 54 58 10 3 2 12 55 78 >>> d one two three four 0 5 8 9 6 1 3 5 7 9 2 33 54 58 10 3 2 12 55 78
2) create as an array:
#Create data box by dictionary e=pda.DataFrame({ "one":4, "two":[3,2,1], "three":list(str(982)), })
If the data is uneven, it will be filled automatically, and the result is as follows:
>>> e one three two 0 4 9 3 1 4 8 2 2 4 2 1
(3) data acquisition:
f=d.head()#Header data, default top five lines g=d.head(1)#Output specific lines from header h=d.tail()#Tail data, the last five lines by default i=d.tail(2)#Output specific lines from tail
Results:
>>> f one two three four 0 5 8 9 6 1 3 5 7 9 2 33 54 58 10 3 2 12 55 78 >>> g one two three four 0 5 8 9 6 >>> h one two three four 0 5 8 9 6 1 3 5 7 9 2 33 54 58 10 3 2 12 55 78 >>> i one two three four 2 33 54 58 10 3 2 12 55 78
(4) data statistics:
d.describe()
Results:
>>> d.describe() one two three four count 4.000000 4.000000 4.00000 4.000000 mean 10.750000 19.750000 32.25000 25.750000 std 14.885675 23.012678 28.04015 34.874776 min 2.000000 5.000000 7.00000 6.000000 25% 2.750000 7.250000 8.50000 8.250000 50% 4.000000 10.000000 32.00000 9.500000 75% 12.000000 22.500000 55.75000 27.000000 max 33.000000 54.000000 58.00000 78.000000
From top to bottom, they represent: element number, average value, standard deviation, minimum value, 25% quantile, 50% quantile, 75% quantile, maximum value.
(5) transpose (row and column position reversed)
d=pda.DataFrame([[5,8,9,6],[3,5,7,9],[33,54,58,10],[2,12,55,78]],columns=["one","two","three","four"]) j=d.T
Results:
>>> d.T 0 1 2 3 one 5 3 33 2 two 8 5 54 12 three 9 7 58 55 four 6 9 10 78