# LML learning punch in (total)

Posted by [uk]stuff on Tue, 22 Feb 2022 10:08:43 +0100

• Step 3: analyze the type of each column and the number of values
• Step 4: analyze whether each column contains missing values

The code is as follows

```import pandas as pd
print(doc)
file=pd.DataFrame(doc)
print(file.info())#Analyze the type and number of values of each column
null_all=file.isnull().sum()#Analyze the number of missing values in each column
print(null_all)#Statistics of missing values
print(file.isnull())#-Analyze whether each column contains missing values
```

Task 2: Pandas data location index

Key points of task: data selection and data index

• Step 2: select the Total column
• Step 3: select the Total column and HP column
• Step 4: select row 10-40 data
• Step 5: select the Total column and HP column in rows 10-40
• ```import pandas as pd
file=pd.DataFrame(df)
file.to_csv('CSV Document 2.csv')
print(df[['Total']])#Select the total column
print(df[['Total','HP']])#Select the Total column and the HP column
print(file.iloc[10:41,:])#Select row 10-40 data
print(file.iloc[10:41,:],[['Total','HP']])#Select the Total column and HP column in rows 10-40```

• Task 3: Pandas data logical index

Key points: logical index

• Step 1: read the file https://cdn.coggle.club/Pokemon.csv
• Step 2: filter out the data with Type 1 as Grass
• Step 3: filter out the data with Type 1 as Grass and Type 2 as Poison
• Step 4: filter out data with HP greater than 50 or Speed less than 90
• Step 5: filter out the data whose Type 1 value is Grass or Fire, HP is between 70 and 90, and Speed starts with the number 8
• ```import pandas as pd
df=pd.DataFrame(file)
df2=df.set_index('Type 1')#Set column index 'Type 1'
print(df2.loc['Grass'])#Then use the loc indexer to find 'Grass'
df3=df2.set_index('Type 2',append=True)#Multilevel index
df2_sorted=df3.sort_index()
print(df2_sorted.loc[('Grass','Poison')])#Search under multi-level index
def condition(x):#Filter out data with HP greater than 50 or Speed less than 90
a=x.HP>50
b=x.Speed<90
result=a&b
return result
print(df3.loc[condition])
df4=df.set_index(['Type 1'])
df5=df4.loc[['Grass','Fire']]
df4_sorted=df5.sort_index()

def B(x):#Filter out the data whose Type 1 value is Grass or Fire, HP is between 70 and 90, and Speed starts with the number 8
a=x.HP>70
b=x.HP<90
c=[str(d).startswith('8')for d in df4_sorted.Speed.values]
result=a&b&c
return result
print(df4_sorted.loc[B])```

• Task 4: Pandas data packet aggregation

• Step 1: read the file https://cdn.coggle.club/Pokemon.csv
• Step 2: learn how to use group by aggregation
• Step 3: learn how to use agg group aggregation
• Step 4: learn how to use transform
• Step 5: use groupby, agg and transform to count the average value of HP under Type 1 grouping
• ```import pandas as pd
print(file)
file1=file.groupby('Type 1')['HP']
print(file1.agg({'HP':'mean'}))
print(file1.transform('mean'))```

• Task 5: Pandas date data processing

Key points of task: date processing

• Step 1: create a column of dt whose value is the unix time from 1638263656 to 1638283656
• Step 2: convert the dt column to datatime format
• Step 3: filter out rows with 10 hours in dt column
• Step 4: add 8 hours to the dt column as a whole
• ```import pandas as pd
import datetime
import time
timestamp_star=1638263656
timestamp_ends=1638283656
dt=pd.to_datetime(range(timestamp_star,timestamp_ends),unit='s')#Timestamp converted to datetime type
print(dt)
print(dt[dt.hour==10])#Filter out rows with 10 hours in dt column
time.sleep(1.05)
dt=dt+datetime.timedelta(hours=8)#Increase the dt column by 8 hours as a whole
print(dt)```

• Task 6: Pandas missing value processing

• Step 1: read the file https://cdn.coggle.club/Pokemon.csv
• Step 2: analyze the missing values of each column
• Step 3: fill in the blanks for the missing values of each column
• ```import pandas as pd
df=pd.DataFrame(file)
print(df.info())#Number of analyzable missing values
df=df.fillna(value=0)#Key assignment
print(df.isna())#see

```

• Task 7: Pandas data visualization

Key points of the task: plot

• Step 1: read the file https://cdn.coggle.club/Pokemon.csv
• Step 2: count the average values of HP, Attack and Defense under Type 1 group, and draw a histogram
• Step 3: plot the HP and Attack of all samples into a scatter diagram
• ```import pandas as pd
import matplotlib as mpl
#import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
plt.rcParams['font.family'] = ['Source Han Sans CN']
df=pd.DataFrame(file)
mean=df.groupby('Type 1')[['HP','Attack','Defense']].mean()
#Bar chart
mean.plot(kind='bar',title='mean value')
plt.show()
#Scatter diagram
data=df[['HP','Attack']]
data.plot(kind='scatter',x='HP',y='Attack')
plt.show()```

Task 8: Pandas multi table merging and aggregation

• Key points: merge and join

• Step 1: create the following data
• Step 2: Merge data1 and data2 Using Inner Join
• Step 3: Merge data1 and data2 Using Outer Join
• Step 4: Merge data1 and data2 Using Left Join
• Step 5: Merge data1 and data2 Using Right Join
• Step 6: merge data1, data2 and data3 using outer join
• Step 7: Merge data1 and data2 based on Index

data1 = pd.DataFrame({"ID":range(101, 106),

"x1":range(1, 6),

"x2":["a", "b", "c", "d", "e"],

"x3":range(16, 11, - 1)})

data2 = pd.DataFrame({"ID":range(104, 108),

"y1":["x", "y", "x", "y"],

"y2":range(8, 1, - 2)})

data3 = pd.DataFrame({"ID":range(102, 110),

"z1":range(10, 18),

"z2":["z", "b", "z", "z", "d", "z", "d", "a"],

"z3":range(18, 10, - 1)})

```import pandas as pd
data1 = pd.DataFrame({"ID":range(101, 106),
"x1":range(1, 6),
"x2":["a", "b", "c", "d", "e"],
"x3":range(16, 11, - 1)})

data2 = pd.DataFrame({"ID":range(104, 108),
"y1":["x", "y", "x", "y"],
"y2":range(8, 1, - 2)})

data3 = pd.DataFrame({"ID":range(102, 110),
"z1":range(10, 18),
"z2":["z", "b", "z", "z", "d", "z", "d", "a"],
"z3":range(18, 10, - 1)})
print(data1)
print(data2)
print(data3)

#inner connection
data_inner=data1.merge(data2,how='inner')
#Outer connection
data_outter=data1.merge(data2,how='outer')
#left connection
data_left=data1.merge(data2,how='left')
#Right right connection
data_right=data1.merge(data2,how='right')

data_mult=data1.merge([data2,data3],how='outer')

data_merge=data1.merge(data2,on='ID')

```

Task 9: Pandas PivotTable and crosstab operations

• Step 2: count the average value of HP, attack and defense under Type 1 and Type 2 groups
• Step 3: count the average value of Attack when Type 1 is index and Type 2 is different columns