Task 1: Pandas data reading, saving and data type
Key tasks: file reading, saving and data type analysis
- Step 1: read the file https://cdn.coggle.club/Pokemon.csv
- Step 2: save the read and the header
- Step 3: analyze the type of each column and the number of values
- Step 4: analyze whether each column contains missing values
The code is as follows
import pandas as pd doc=pd.read_csv( 'E:\Pokemon.csv') print(doc) file=pd.DataFrame(doc) file2=file.to_csv('csv file.csv',header=True) print(file.info())#Analyze the type and number of values of each column null_all=file.isnull().sum()#Analyze the number of missing values in each column print(null_all)#Statistics of missing values print(file.isnull())#-Analyze whether each column contains missing values
Task 2: Pandas data location index
Key points of task: data selection and data index
- Step 1: read the file https://cdn.coggle.club/Pokemon.csv
- Step 2: select the Total column
- Step 3: select the Total column and HP column
- Step 4: select row 10-40 data
- Step 5: select the Total column and HP column in rows 10-40
-
import pandas as pd df=pd.read_csv('E:\Pokemon (1).csv') file=pd.DataFrame(df) file.to_csv('CSV Document 2.csv') print(df[['Total']])#Select the total column print(df[['Total','HP']])#Select the Total column and the HP column print(file.iloc[10:41,:])#Select row 10-40 data print(file.iloc[10:41,:],[['Total','HP']])#Select the Total column and HP column in rows 10-40
-
Task 3: Pandas data logical index
Key points: logical index
- Step 1: read the file https://cdn.coggle.club/Pokemon.csv
- Step 2: filter out the data with Type 1 as Grass
- Step 3: filter out the data with Type 1 as Grass and Type 2 as Poison
- Step 4: filter out data with HP greater than 50 or Speed less than 90
- Step 5: filter out the data whose Type 1 value is Grass or Fire, HP is between 70 and 90, and Speed starts with the number 8
-
import pandas as pd file=pd.read_csv('E:\Pokemon (2).csv')#Read CSV file df=pd.DataFrame(file) df2=df.set_index('Type 1')#Set column index 'Type 1' print(df2.loc['Grass'])#Then use the loc indexer to find 'Grass' df3=df2.set_index('Type 2',append=True)#Multilevel index df2_sorted=df3.sort_index() print(df2_sorted.loc[('Grass','Poison')])#Search under multi-level index def condition(x):#Filter out data with HP greater than 50 or Speed less than 90 a=x.HP>50 b=x.Speed<90 result=a&b return result print(df3.loc[condition]) df4=df.set_index(['Type 1']) df5=df4.loc[['Grass','Fire']] df4_sorted=df5.sort_index() def B(x):#Filter out the data whose Type 1 value is Grass or Fire, HP is between 70 and 90, and Speed starts with the number 8 a=x.HP>70 b=x.HP<90 c=[str(d).startswith('8')for d in df4_sorted.Speed.values] result=a&b&c return result print(df4_sorted.loc[B])
-
Task 4: Pandas data packet aggregation
Key tasks: groupby, agg, transform
- Step 1: read the file https://cdn.coggle.club/Pokemon.csv
- Step 2: learn how to use group by aggregation
- Step 3: learn how to use agg group aggregation
- Step 4: learn how to use transform
- Step 5: use groupby, agg and transform to count the average value of HP under Type 1 grouping
-
import pandas as pd file=pd.read_csv('E:\Pokemon (3).csv') print(file) file1=file.groupby('Type 1')['HP'] print(file1.agg({'HP':'mean'})) print(file1.transform('mean'))
-
Task 5: Pandas date data processing
Key points of task: date processing
- Step 1: create a column of dt whose value is the unix time from 1638263656 to 1638283656
- Step 2: convert the dt column to datatime format
- Step 3: filter out rows with 10 hours in dt column
- Step 4: add 8 hours to the dt column as a whole
-
import pandas as pd import datetime import time timestamp_star=1638263656 timestamp_ends=1638283656 dt=pd.to_datetime(range(timestamp_star,timestamp_ends),unit='s')#Timestamp converted to datetime type print(dt) print(dt[dt.hour==10])#Filter out rows with 10 hours in dt column time.sleep(1.05) dt=dt+datetime.timedelta(hours=8)#Increase the dt column by 8 hours as a whole print(dt)
-
Task 6: Pandas missing value processing
- Step 1: read the file https://cdn.coggle.club/Pokemon.csv
- Step 2: analyze the missing values of each column
- Step 3: fill in the blanks for the missing values of each column
-
import pandas as pd file=pd.read_csv("E:\Pokemon (4).csv") df=pd.DataFrame(file) print(df.info())#Number of analyzable missing values df=df.fillna(value=0)#Key assignment print(df.isna())#see
-
Task 7: Pandas data visualization
Key points of the task: plot
- Step 1: read the file https://cdn.coggle.club/Pokemon.csv
- Step 2: count the average values of HP, Attack and Defense under Type 1 group, and draw a histogram
- Step 3: plot the HP and Attack of all samples into a scatter diagram
-
import pandas as pd import matplotlib as mpl #import matplotlib.pyplot as plt from matplotlib import pyplot as plt plt.rcParams['font.family'] = ['Source Han Sans CN'] file=pd.read_csv("E:\Pokemon (5).csv") df=pd.DataFrame(file) mean=df.groupby('Type 1')[['HP','Attack','Defense']].mean() #Bar chart mean.plot(kind='bar',title='mean value') plt.show() #Scatter diagram data=df[['HP','Attack']] data.plot(kind='scatter',x='HP',y='Attack') plt.show()
Task 8: Pandas multi table merging and aggregation
-
Key points: merge and join
- Step 1: create the following data
- Step 2: Merge data1 and data2 Using Inner Join
- Step 3: Merge data1 and data2 Using Outer Join
- Step 4: Merge data1 and data2 Using Left Join
- Step 5: Merge data1 and data2 Using Right Join
- Step 6: merge data1, data2 and data3 using outer join
- Step 7: Merge data1 and data2 based on Index
data1 = pd.DataFrame({"ID":range(101, 106),
"x1":range(1, 6),
"x2":["a", "b", "c", "d", "e"],
"x3":range(16, 11, - 1)})
data2 = pd.DataFrame({"ID":range(104, 108),
"y1":["x", "y", "x", "y"],
"y2":range(8, 1, - 2)})
data3 = pd.DataFrame({"ID":range(102, 110),
"z1":range(10, 18),
"z2":["z", "b", "z", "z", "d", "z", "d", "a"],
"z3":range(18, 10, - 1)})
import pandas as pd data1 = pd.DataFrame({"ID":range(101, 106), "x1":range(1, 6), "x2":["a", "b", "c", "d", "e"], "x3":range(16, 11, - 1)}) data2 = pd.DataFrame({"ID":range(104, 108), "y1":["x", "y", "x", "y"], "y2":range(8, 1, - 2)}) data3 = pd.DataFrame({"ID":range(102, 110), "z1":range(10, 18), "z2":["z", "b", "z", "z", "d", "z", "d", "a"], "z3":range(18, 10, - 1)}) print(data1) print(data2) print(data3) #inner connection data_inner=data1.merge(data2,how='inner') #Outer connection data_outter=data1.merge(data2,how='outer') #left connection data_left=data1.merge(data2,how='left') #Right right connection data_right=data1.merge(data2,how='right') data_mult=data1.merge([data2,data3],how='outer') data_merge=data1.merge(data2,on='ID')
Task 9: Pandas PivotTable and crosstab operations
- Step 1: read the file https://cdn.coggle.club/Pokemon.csv
- Step 2: count the average value of HP, attack and defense under Type 1 and Type 2 groups
- Step 3: count the average value of Attack when Type 1 is index and Type 2 is different columns