Function application of 5-Pandas data grouping (df.apply(), DF AGG () and DF transform(),df.applymap())

Posted by thoand on Mon, 07 Mar 2022 04:48:43 +0100

 

 

There are three ways to apply self-defined or other library functions to Pandas objects:

  1. apply(): apply the function row by row or column by column
  2. agg() and transform(): aggregation and transformation
  3. applymap(): apply functions element by element

1, apply()

Where: set the axis = 1 parameter, which can be operated line by line; The default axis=0, that is, the operation is performed column by column;

For common descriptive statistical methods, you can directly use a string instead, for example DF Apply ('mean ') is equivalent to DF apply(np.mean);

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 >>> df = pd.read_excel('./input/class.xlsx) >>> df = df[['score_math','score_music']] >>> df    score_math  score_music 0          95           79 1          96           90 2          85           85 3          93           92 4          84           90 5          88           70 6          59           89 7          88           86 8          89           74   #Average the scores of music and mathematics one by one >>> df.apply(np.mean) score_math     86.333333 score_music    83.888889 dtype: float64 >>> type(df.apply(np.mean)) <class 'pandas.core.series.Series'>   >>> df['score_math'].apply('mean') 86.33333333333333 >>> type(df['score_math'].apply(np.mean)) <class 'pandas.core.series.Series'>   #Find the average score of each student line by line >>> df.apply(np.mean,axis=1) 0    87.0 1    93.0 2    85.0 3    92.5 4    87.0 5    79.0 6    74.0 7    87.0 8    81.5 dtype: float64 >>> type(df.apply(np.mean,axis=1)) <class 'pandas.core.series.Series'>

The return result of apply() is related to the function used:

  • The returned result is a Series object: the mean function applied in the above example returns a value for each row or column;
  • Return DataFrame of the same size: such as the lambda function defined below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 #x can be regarded as a Series object of each class >>> df.apply(lambda x: x - 5)    score_math  score_music 0          90           74 1          91           85 2          80           80 3          88           87 4          79           85 5          83           65 6          54           84 7          83           81 8          84           69 >>> type(df.apply(lambda x: x - 5)) <class 'pandas.core.frame.DataFrame'>

2, Data aggregation (AGG)

  • Data aggregation (AGG) refers to any process that can generate scalar values from an array;
  • Equivalent to the special case of apply(), pandas objects can be processed row by row or column by column;
  • Where agg() can be used, basically apply() can be used instead.

Example:

1) Average the two courses one by one

1 2 3 4 5 6 7 8 >>> df.agg('mean') score_math     86.333333 score_music    83.888889 dtype: float64 >>> df.apply('mean') score_math     86.333333 score_music    83.888889 dtype: float64

2) when multiple functions are applied, the functions can be placed in one list;

E.g.: get the highest and lowest scores for the two courses respectively

1 2 3 4 5 6 7 8 >>> df.agg(['max','min'])      score_math  score_music max          96           92 min          59           70 >>> df.apply([np.max,'min'])       score_math  score_music amax          96           92 min           59           70

3) use the dictionary to apply specific and multiple functions to specific columns;

Example: seek the mean and minimum value of mathematics scores and the maximum value of music lessons

1 2 3 4 5 >>> df.agg({'score_math':['mean','min'],'score_music':'max'})       score_math  score_music max          NaN         92.0 mean   86.333333          NaN min    59.000000          NaN

3, Data transformation ()

Features: after using a function, it returns Pandas objects of the same size

Difference from data aggregation (AGG):

  1. Data aggregation (AGG) returns the reduction process of the total data in the group;
  2. The data transformation () returns a new full amount of data.

Note: DF Transform (NP. Mean) will report an error, and the transformation cannot produce aggregation results

1 2 3 4 5 6 7 8 9 10 11 12 13 14 #Subtracting the average score of each course from the score can be achieved by using apply, agg and transfrom >>> df.transform(lambda x:x-x.mean()) >>> df.apply(lambda x:x-x.mean()) >>> df.agg(lambda x:x-x.mean())    score_math  score_music 0    8.666667    -4.888889 1    9.666667     6.111111 2   -1.333333     1.111111 3    6.666667     8.111111 4   -2.333333     6.111111 5    1.666667   -13.888889 6  -27.333333     5.111111 7    1.666667     2.111111 8    2.666667    -9.888889

When multiple functions are applied, dataframes with different sizes of the original DataFrame will be returned. The returned results are as follows:

  • On the column index, the first level is the original column name
  • At the second level is the function name of the transformation
1 2 3 4 5 6 7 8 9 10 11 12 >>> df.transform([lambda x:x-x.mean(),lambda x:x/10])   score_math          score_music     <lambda> <lambda>    <lambda> <lambda> 0   8.666667      9.5   -4.888889      7.9 1   9.666667      9.6    6.111111      9.0 2  -1.333333      8.5    1.111111      8.5 3   6.666667      9.3    8.111111      9.2 4  -2.333333      8.4    6.111111      9.0 5   1.666667      8.8  -13.888889      7.0 6 -27.333333      5.9    5.111111      8.9 7   1.666667      8.8    2.111111      8.6 8   2.666667      8.9   -9.888889      7.4

4, applymap()

applymap() applies a function element by element to the pandas object, which becomes an element level function application;

And map() Differences between:

  • applymap() is an instance method of DataFrame
  • map() is an instance method of Series

Example: keep the score to two decimal places

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 >>> df.applymap(lambda x:'%.2f'%x)   score_math score_music 0      95.00       79.00 1      96.00       90.00 2      85.00       85.00 3      93.00       92.00 4      84.00       90.00 5      88.00       70.00 6      59.00       89.00 7      88.00       86.00 8      89.00       74.00   >>> df['score_math'].map(lambda x:'%.2f'%x) 0    95.00 1    96.00 2    85.00 3    93.00 4    84.00 5    88.00 6    59.00 7    88.00 8    89.00 Name: score_math, dtype: object

As can be seen from the above example, the applymap() operation is actually a map() operation on the Series objects of each column

 

 

Through the above analysis, we can see that the three methods of apply, agg and transform can perform functional operations on grouped data, but they also have their own characteristics, which are summarized as follows:

  • The user-defined function in apply processes each grouped data separately, and then merges the results; The function output of the whole DataFrame can be scalar, Series or DataFrame; Each apply statement can only pass in one function;
  • agg can specify features through dictionary to perform different function operations, and the function output of each feature must be scalar;
  • transform cannot specify features through dictionary for different function operations, but the function operation unit is also each feature of DataFrame. The function output of each feature can be scalar or Series, but the scalar will be broadcast.