LML learning punch in (total)

Posted by [uk]stuff on Tue, 22 Feb 2022 10:08:43 +0100

Task 1: Pandas data reading, saving and data type

Key tasks: file reading, saving and data type analysis

Step 1: read the file https://cdn.coggle.club/Pokemon.csv

Step 2: save the read and the header

Step 3: analyze the type of each column and the number of values

Step 4: analyze whether each column contains missing values

The code is as follows

import pandas as pd
doc=pd.read_csv( 'E:\Pokemon.csv')
print(doc)
file=pd.DataFrame(doc)
file2=file.to_csv('csv file.csv',header=True)
print(file.info())#Analyze the type and number of values of each column
null_all=file.isnull().sum()#Analyze the number of missing values in each column
print(null_all)#Statistics of missing values
print(file.isnull())#-Analyze whether each column contains missing values

Task 2: Pandas data location index

Key points of task: data selection and data index

Step 1: read the file https://cdn.coggle.club/Pokemon.csv

Step 2: select the Total column

Step 3: select the Total column and HP column

Step 4: select row 10-40 data

Step 5: select the Total column and HP column in rows 10-40

import pandas as pd
df=pd.read_csv('E:\Pokemon (1).csv')
file=pd.DataFrame(df)
file.to_csv('CSV Document 2.csv')
print(df[['Total']])#Select the total column
print(df[['Total','HP']])#Select the Total column and the HP column
print(file.iloc[10:41,:])#Select row 10-40 data
print(file.iloc[10:41,:],[['Total','HP']])#Select the Total column and HP column in rows 10-40

Task 3: Pandas data logical index

Key points: logical index
Step 1: read the file https://cdn.coggle.club/Pokemon.csv
Step 2: filter out the data with Type 1 as Grass
Step 3: filter out the data with Type 1 as Grass and Type 2 as Poison
Step 4: filter out data with HP greater than 50 or Speed less than 90
Step 5: filter out the data whose Type 1 value is Grass or Fire, HP is between 70 and 90, and Speed starts with the number 8

import pandas as pd
file=pd.read_csv('E:\Pokemon (2).csv')#Read CSV file
df=pd.DataFrame(file)
df2=df.set_index('Type 1')#Set column index 'Type 1'
print(df2.loc['Grass'])#Then use the loc indexer to find 'Grass'
df3=df2.set_index('Type 2',append=True)#Multilevel index
df2_sorted=df3.sort_index()
print(df2_sorted.loc[('Grass','Poison')])#Search under multi-level index
def condition(x):#Filter out data with HP greater than 50 or Speed less than 90
    a=x.HP>50
    b=x.Speed<90
    result=a&b
    return result
print(df3.loc[condition])
df4=df.set_index(['Type 1'])
df5=df4.loc[['Grass','Fire']]
df4_sorted=df5.sort_index()

def B(x):#Filter out the data whose Type 1 value is Grass or Fire, HP is between 70 and 90, and Speed starts with the number 8
    a=x.HP>70
    b=x.HP<90
    c=[str(d).startswith('8')for d in df4_sorted.Speed.values]
    result=a&b&c
    return result
print(df4_sorted.loc[B])

Task 4: Pandas data packet aggregation

Key tasks: groupby, agg, transform
Step 1: read the file https://cdn.coggle.club/Pokemon.csv
Step 2: learn how to use group by aggregation
Step 3: learn how to use agg group aggregation
Step 4: learn how to use transform
Step 5: use groupby, agg and transform to count the average value of HP under Type 1 grouping

import pandas as pd
file=pd.read_csv('E:\Pokemon (3).csv')
print(file)
file1=file.groupby('Type 1')['HP']
print(file1.agg({'HP':'mean'}))
print(file1.transform('mean'))

Task 5: Pandas date data processing

Key points of task: date processing
Step 1: create a column of dt whose value is the unix time from 1638263656 to 1638283656
Step 2: convert the dt column to datatime format
Step 3: filter out rows with 10 hours in dt column
Step 4: add 8 hours to the dt column as a whole

import pandas as pd
import datetime
import time
timestamp_star=1638263656
timestamp_ends=1638283656
dt=pd.to_datetime(range(timestamp_star,timestamp_ends),unit='s')#Timestamp converted to datetime type
print(dt)
print(dt[dt.hour==10])#Filter out rows with 10 hours in dt column
time.sleep(1.05)
dt=dt+datetime.timedelta(hours=8)#Increase the dt column by 8 hours as a whole
print(dt)

Task 6: Pandas missing value processing
Step 1: read the file https://cdn.coggle.club/Pokemon.csv
Step 2: analyze the missing values of each column
Step 3: fill in the blanks for the missing values of each column

import pandas as pd
file=pd.read_csv("E:\Pokemon (4).csv")
df=pd.DataFrame(file)
print(df.info())#Number of analyzable missing values
df=df.fillna(value=0)#Key assignment
print(df.isna())#see

Task 7: Pandas data visualization

Key points of the task: plot
Step 1: read the file https://cdn.coggle.club/Pokemon.csv
Step 2: count the average values of HP, Attack and Defense under Type 1 group, and draw a histogram
Step 3: plot the HP and Attack of all samples into a scatter diagram

import pandas as pd
import matplotlib as mpl
#import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
plt.rcParams['font.family'] = ['Source Han Sans CN']
file=pd.read_csv("E:\Pokemon (5).csv")
df=pd.DataFrame(file)
mean=df.groupby('Type 1')[['HP','Attack','Defense']].mean()
#Bar chart
mean.plot(kind='bar',title='mean value')
plt.show()
#Scatter diagram
data=df[['HP','Attack']]
data.plot(kind='scatter',x='HP',y='Attack')
plt.show()

Task 8: Pandas multi table merging and aggregation

Key points: merge and join
Step 1: create the following data
Step 2: Merge data1 and data2 Using Inner Join
Step 3: Merge data1 and data2 Using Outer Join
Step 4: Merge data1 and data2 Using Left Join
Step 5: Merge data1 and data2 Using Right Join
Step 6: merge data1, data2 and data3 using outer join
Step 7: Merge data1 and data2 based on Index

data1 = pd.DataFrame({"ID":range(101, 106),

"x1":range(1, 6),

"x2":["a", "b", "c", "d", "e"],

"x3":range(16, 11, - 1)})

data2 = pd.DataFrame({"ID":range(104, 108),

"y1":["x", "y", "x", "y"],

"y2":range(8, 1, - 2)})

data3 = pd.DataFrame({"ID":range(102, 110),

"z1":range(10, 18),

"z2":["z", "b", "z", "z", "d", "z", "d", "a"],

"z3":range(18, 10, - 1)})

import pandas as pd
data1 = pd.DataFrame({"ID":range(101, 106), 
 "x1":range(1, 6),
 "x2":["a", "b", "c", "d", "e"],
 "x3":range(16, 11, - 1)})
 
data2 = pd.DataFrame({"ID":range(104, 108), 
 "y1":["x", "y", "x", "y"],
 "y2":range(8, 1, - 2)})
 
data3 = pd.DataFrame({"ID":range(102, 110), 
 "z1":range(10, 18),
 "z2":["z", "b", "z", "z", "d", "z", "d", "a"],
 "z3":range(18, 10, - 1)})
print(data1)
print(data2)
print(data3)

#inner connection
data_inner=data1.merge(data2,how='inner')
#Outer connection
data_outter=data1.merge(data2,how='outer')
#left connection
data_left=data1.merge(data2,how='left')
#Right right connection
data_right=data1.merge(data2,how='right')

data_mult=data1.merge([data2,data3],how='outer')

data_merge=data1.merge(data2,on='ID')

Task 9: Pandas PivotTable and crosstab operations

Step 1: read the file https://cdn.coggle.club/Pokemon.csv

Step 2: count the average value of HP, attack and defense under Type 1 and Type 2 groups

Step 3: count the average value of Attack when Type 1 is index and Type 2 is different columns

Programmer Think

LML learning punch in (total)

Hot Topics