It is an important process of data cleaning, which can be calculated according to index alignment. If there is no alignment position, then the NaN can be filled. Finally, the NaN can also be filled
Alignment of Series
1. Series alignment by row and index
Example code:
s1 = pd.Series(range(10, 20), index = range(10)) s2 = pd.Series(range(20, 25), index = range(5)) print('s1: ' ) print(s1) print('') print('s2: ') print(s2)
Operation result:
s1: 0 10 1 11 2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19 dtype: int64 s2: 0 20 1 21 2 22 3 23 4 24 dtype: int64
2. Alignment of series
Example code:
# Series alignment s1 + s2
Operation result:
0 30.0 1 32.0 2 34.0 3 36.0 4 38.0 5 NaN 6 NaN 7 NaN 8 NaN 9 NaN dtype: float64
Alignment of DataFrame
1. DataFrame is aligned by row and column index
Example code:
df1 = pd.DataFrame(np.ones((2,2)), columns = ['a', 'b']) df2 = pd.DataFrame(np.ones((3,3)), columns = ['a', 'b', 'c']) print('df1: ') print(df1) print('') print('df2: ') print(df2)
Operation result:
df1: a b 0 1.0 1.0 1 1.0 1.0 df2: a b c 0 1.0 1.0 1.0 1 1.0 1.0 1.0 2 1.0 1.0 1.0
2. Alignment of dataframe
Example code:
# DataFrame alignment operation df1 + df2
Operation result:
a b c 0 2.0 2.0 NaN 1 2.0 2.0 NaN 2 NaN NaN NaN
Fill in unaligned data for operation
1. fill_value
While using add, sub, div, mul, Fill in value is specified by fill in value, and unaligned data will be calculated with fill in value
Example code:
print(s1) print(s2) s1.add(s2, fill_value = -1) print(df1) print(df2) df1.sub(df2, fill_value = 2.)
Operation result:
# print(s1) 0 10 1 11 2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19 dtype: int64 # print(s2) 0 20 1 21 2 22 3 23 4 24 dtype: int64 # s1.add(s2, fill_value = -1) 0 30.0 1 32.0 2 34.0 3 36.0 4 38.0 5 14.0 6 15.0 7 16.0 8 17.0 9 18.0 dtype: float64 # print(df1) a b 0 1.0 1.0 1 1.0 1.0 # print(df2) a b c 0 1.0 1.0 1.0 1 1.0 1.0 1.0 2 1.0 1.0 1.0 # df1.sub(df2, fill_value = 2.) a b c 0 0.0 0.0 1.0 1 0.0 0.0 1.0 2 1.0 1.0 1.0