Pandas Data Analysis - Chapter 3 (Index)
The index object of pandas manages week labels and other metadata, and any array or other sequence labels used to build Series or DataFrame poems are converted to an index.
For example
In [84]: obj = pd.Series(range(3),index=list('abc')) In [85]: obj Out[85]: a 0 b 1 c 2 dtype: int64 In [86]: index = obj.index In [87]: index Out[87]: Index(['a', 'b', 'c'], dtype='object') In [88]: index[1:] Out[88]: Index(['b', 'c'], dtype='object') In [89]: index[0] = 10 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-89-549e9c5948ff> in <module> ----> 1 index[0] = 10 ~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in __setitem__(self, key, value) 3936 3937 def __setitem__(self, key, value): -> 3938 raise TypeError("Index does not support mutable operations") 3939 3940 def __getitem__(self, key): TypeError: Index does not support mutable operations
We created a Serries named obj, whose index is ['a','b','c'], which can be sliced to take values, but cannot be modified directly.
But we can replace index directly.
In [90]: obj.index=['d','e','f',] In [91]: obj Out[91]: d 0 e 1 f 2 dtype: int64
In fact, besides looking at arrays, index functions like a fixed-size collection.
You can even use functions like in.
In [99]: 'year' in df.columns Out[99]: True In [100]: 3 in df.index Out[100]: True In [101]: 5 in df.index Out[101]: False
Here are some methods of index
append
Link to another index object (which must be an index object, not a list) to produce a new index object
In [103]: index1 = pd.Index([1,2,3,4]) In [104]: index2 = pd.Index([4,5,6,7]) In [105]: index1.append(index2) Out[105]: Int64Index([1, 2, 3, 4, 4, 5, 6, 7], dtype='int64')
Try the pleasure of reporting errors. We pass in a list.
In [106]: index1.append([4,5,6,7]) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-106-00a73366f3f1> in <module> ----> 1 index1.append([4,5,6,7]) ~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in append(self, other) 4008 for obj in to_concat: 4009 if not isinstance(obj, Index): -> 4010 raise TypeError('all inputs must be Index') 4011 4012 names = {obj.name for obj in to_concat} TypeError: all inputs must be Index
must be Index
OK really can't.
intersection
Computational communication, return an Index
In [112]: index1.intersection(index2) Out[112]: Int64Index([4], dtype='int64')
union
Union
In [113]: index1.union(index2) Out[113]: Int64Index([1, 2, 3, 4, 5, 6, 7], dtype='int64')
isin
Calculate whether an array of Boolean values is included in the parameter set
In [114]: index1.isin(index2) Out[114]: array([False, False, False, True])
delete
Delete the elements from index i to get a new index
In [116]: index1.delete(1) Out[116]: Int64Index([1, 3, 4], dtype='int64')
drop
Delete the value of the index name parameter and return the new index
In [118]: index1.drop(3) Out[118]: Int64Index([1, 2, 4], dtype='int64')
insert
insert
In [124]: index1.insert(3,10) Out[124]: Int64Index([1, 2, 3, 10, 4], dtype='int64')
reindex
reindex is a very important method to reorder the original index according to the order of parameters.
In [126]: obj = pd.Series([4.5,7.2,-5.3,3.6],index=['d','b','a','c']) In [127]: obj Out[127]: d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64 In [128]: obj2 = obj.reindex(['a','b','c','d','e']) In [129]: obj2 Out[129]: a -5.3 b 7.2 c 3.6 d 4.5 e NaN dtype: float64
If an index value does not exist, a missing value is introduced.
Missing values can also be specified directly
In [130]: obj.reindex(['a','b','c','d','e'],fill_value=0) Out[130]: a -5.3 b 7.2 c 3.6 d 4.5 e 0.0 dtype: float64
For ordered data such as time series, some interpolation processing may be needed when re-indexing, which can be achieved by mothod option. Forward filling can be achieved with ffill
In [131]: obj3 = pd.Series(['tom','jack','ted'],index=[0,2,4]) In [132]: obj3 Out[132]: 0 tom 2 jack 4 ted dtype: object In [133]: obj3.reindex(range(6),method='ffill') Out[133]: 0 tom 1 tom 2 jack 3 jack 4 ted 5 ted dtype: object
ffill is forward filling and bfill is backward filling
For the DataFrame, reindex can modify rows or columns or both.
In [137]: data = {'location':['beijing','hebei','tianjin','shandong'],'year':[2011,2013,2019,1998],'num':[1.4,1.8,-2.1,3.6]} In [138]: df = pd.DataFrame(data) In [139]: df Out[139]: location year num 0 beijing 2011 1.4 1 hebei 2013 1.8 2 tianjin 2019 -2.1 3 shandong 1998 3.6 In [150]: df.reindex(index=[3,2,1,0]) Out[150]: location year num 3 shandong 1998 3.6 2 tianjin 2019 -2.1 1 hebei 2013 1.8 0 beijing 2011 1.4