LML learning punch in (total)

Posted by [uk]stuff on Tue, 22 Feb 2022 10:08:43 +0100

Task 1: Pandas data reading, saving and data type

Key tasks: file reading, saving and data type analysis

  • Step 2: save the read and the header
  • Step 3: analyze the type of each column and the number of values
  • Step 4: analyze whether each column contains missing values

The code is as follows

import pandas as pd
doc=pd.read_csv( 'E:\Pokemon.csv')
print(doc)
file=pd.DataFrame(doc)
file2=file.to_csv('csv file.csv',header=True)
print(file.info())#Analyze the type and number of values of each column
null_all=file.isnull().sum()#Analyze the number of missing values in each column
print(null_all)#Statistics of missing values
print(file.isnull())#-Analyze whether each column contains missing values

Task 2: Pandas data location index

Key points of task: data selection and data index

  • Step 2: select the Total column
  • Step 3: select the Total column and HP column
  • Step 4: select row 10-40 data
  • Step 5: select the Total column and HP column in rows 10-40
  • import pandas as pd
    df=pd.read_csv('E:\Pokemon (1).csv')
    file=pd.DataFrame(df)
    file.to_csv('CSV Document 2.csv')
    print(df[['Total']])#Select the total column
    print(df[['Total','HP']])#Select the Total column and the HP column
    print(file.iloc[10:41,:])#Select row 10-40 data
    print(file.iloc[10:41,:],[['Total','HP']])#Select the Total column and HP column in rows 10-40

  • Task 3: Pandas data logical index

    Key points: logical index

  • Step 1: read the file https://cdn.coggle.club/Pokemon.csv
  • Step 2: filter out the data with Type 1 as Grass
  • Step 3: filter out the data with Type 1 as Grass and Type 2 as Poison
  • Step 4: filter out data with HP greater than 50 or Speed less than 90
  • Step 5: filter out the data whose Type 1 value is Grass or Fire, HP is between 70 and 90, and Speed starts with the number 8
  • import pandas as pd
    file=pd.read_csv('E:\Pokemon (2).csv')#Read CSV file
    df=pd.DataFrame(file)
    df2=df.set_index('Type 1')#Set column index 'Type 1'
    print(df2.loc['Grass'])#Then use the loc indexer to find 'Grass'
    df3=df2.set_index('Type 2',append=True)#Multilevel index
    df2_sorted=df3.sort_index()
    print(df2_sorted.loc[('Grass','Poison')])#Search under multi-level index
    def condition(x):#Filter out data with HP greater than 50 or Speed less than 90
        a=x.HP>50
        b=x.Speed<90
        result=a&b
        return result
    print(df3.loc[condition])
    df4=df.set_index(['Type 1'])
    df5=df4.loc[['Grass','Fire']]
    df4_sorted=df5.sort_index()
    
    def B(x):#Filter out the data whose Type 1 value is Grass or Fire, HP is between 70 and 90, and Speed starts with the number 8
        a=x.HP>70
        b=x.HP<90
        c=[str(d).startswith('8')for d in df4_sorted.Speed.values]
        result=a&b&c
        return result
    print(df4_sorted.loc[B])

  • Task 4: Pandas data packet aggregation

    Key tasks: groupby, agg, transform

  • Step 1: read the file https://cdn.coggle.club/Pokemon.csv
  • Step 2: learn how to use group by aggregation
  • Step 3: learn how to use agg group aggregation
  • Step 4: learn how to use transform
  • Step 5: use groupby, agg and transform to count the average value of HP under Type 1 grouping
  • import pandas as pd
    file=pd.read_csv('E:\Pokemon (3).csv')
    print(file)
    file1=file.groupby('Type 1')['HP']
    print(file1.agg({'HP':'mean'}))
    print(file1.transform('mean'))

  • Task 5: Pandas date data processing

    Key points of task: date processing

  • Step 1: create a column of dt whose value is the unix time from 1638263656 to 1638283656
  • Step 2: convert the dt column to datatime format
  • Step 3: filter out rows with 10 hours in dt column
  • Step 4: add 8 hours to the dt column as a whole
  • import pandas as pd
    import datetime
    import time
    timestamp_star=1638263656
    timestamp_ends=1638283656
    dt=pd.to_datetime(range(timestamp_star,timestamp_ends),unit='s')#Timestamp converted to datetime type
    print(dt)
    print(dt[dt.hour==10])#Filter out rows with 10 hours in dt column
    time.sleep(1.05)
    dt=dt+datetime.timedelta(hours=8)#Increase the dt column by 8 hours as a whole
    print(dt)

  • Task 6: Pandas missing value processing

  • Step 1: read the file https://cdn.coggle.club/Pokemon.csv
  • Step 2: analyze the missing values of each column
  • Step 3: fill in the blanks for the missing values of each column
  • import pandas as pd
    file=pd.read_csv("E:\Pokemon (4).csv")
    df=pd.DataFrame(file)
    print(df.info())#Number of analyzable missing values
    df=df.fillna(value=0)#Key assignment
    print(df.isna())#see
    
    

  • Task 7: Pandas data visualization

    Key points of the task: plot

  • Step 1: read the file https://cdn.coggle.club/Pokemon.csv
  • Step 2: count the average values of HP, Attack and Defense under Type 1 group, and draw a histogram
  • Step 3: plot the HP and Attack of all samples into a scatter diagram
  • import pandas as pd
    import matplotlib as mpl
    #import matplotlib.pyplot as plt
    from matplotlib import pyplot as plt
    plt.rcParams['font.family'] = ['Source Han Sans CN']
    file=pd.read_csv("E:\Pokemon (5).csv")
    df=pd.DataFrame(file)
    mean=df.groupby('Type 1')[['HP','Attack','Defense']].mean()
    #Bar chart
    mean.plot(kind='bar',title='mean value')
    plt.show()
    #Scatter diagram
    data=df[['HP','Attack']]
    data.plot(kind='scatter',x='HP',y='Attack')
    plt.show()

    Task 8: Pandas multi table merging and aggregation

  • Key points: merge and join

  • Step 1: create the following data
  • Step 2: Merge data1 and data2 Using Inner Join
  • Step 3: Merge data1 and data2 Using Outer Join
  • Step 4: Merge data1 and data2 Using Left Join
  • Step 5: Merge data1 and data2 Using Right Join
  • Step 6: merge data1, data2 and data3 using outer join
  • Step 7: Merge data1 and data2 based on Index

data1 = pd.DataFrame({"ID":range(101, 106),

"x1":range(1, 6),

"x2":["a", "b", "c", "d", "e"],

"x3":range(16, 11, - 1)})

data2 = pd.DataFrame({"ID":range(104, 108),

"y1":["x", "y", "x", "y"],

"y2":range(8, 1, - 2)})

data3 = pd.DataFrame({"ID":range(102, 110),

"z1":range(10, 18),

"z2":["z", "b", "z", "z", "d", "z", "d", "a"],

"z3":range(18, 10, - 1)})

import pandas as pd
data1 = pd.DataFrame({"ID":range(101, 106), 
 "x1":range(1, 6),
 "x2":["a", "b", "c", "d", "e"],
 "x3":range(16, 11, - 1)})
 
data2 = pd.DataFrame({"ID":range(104, 108), 
 "y1":["x", "y", "x", "y"],
 "y2":range(8, 1, - 2)})
 
data3 = pd.DataFrame({"ID":range(102, 110), 
 "z1":range(10, 18),
 "z2":["z", "b", "z", "z", "d", "z", "d", "a"],
 "z3":range(18, 10, - 1)})
print(data1)
print(data2)
print(data3)

#inner connection
data_inner=data1.merge(data2,how='inner')
#Outer connection
data_outter=data1.merge(data2,how='outer')
#left connection
data_left=data1.merge(data2,how='left')
#Right right connection
data_right=data1.merge(data2,how='right')

data_mult=data1.merge([data2,data3],how='outer')

data_merge=data1.merge(data2,on='ID')


Task 9: Pandas PivotTable and crosstab operations

  • Step 2: count the average value of HP, attack and defense under Type 1 and Type 2 groups
  • Step 3: count the average value of Attack when Type 1 is index and Type 2 is different columns