Posted by kmutz22 on Mon, 05 Aug 2019 10:24:29 +0200

Pandas Data Analysis - Chapter 3 (Index)

The index object of pandas manages week labels and other metadata, and any array or other sequence labels used to build Series or DataFrame poems are converted to an index.
For example

In [84]: obj = pd.Series(range(3),index=list('abc'))                                                                                                                                                         

In [85]: obj                                                                                                                                                                                                 
a    0
b    1
c    2
dtype: int64

In [86]: index = obj.index                                                                                                                                                                                   

In [87]: index                                                                                                                                                                                               
Out[87]: Index(['a', 'b', 'c'], dtype='object')

In [88]: index[1:]                                                                                                                                                                                           
Out[88]: Index(['b', 'c'], dtype='object')

In [89]: index[0] = 10                                                                                                                                                                                       
TypeError                                 Traceback (most recent call last)
<ipython-input-89-549e9c5948ff> in <module>
----> 1 index[0] = 10

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/ in __setitem__(self, key, value)
   3937     def __setitem__(self, key, value):
-> 3938         raise TypeError("Index does not support mutable operations")
   3940     def __getitem__(self, key):

TypeError: Index does not support mutable operations

We created a Serries named obj, whose index is ['a','b','c'], which can be sliced to take values, but cannot be modified directly.
But we can replace index directly.

In [90]: obj.index=['d','e','f',]                                                                                                                                                                            

In [91]: obj                                                                                                                                                                                                 
d    0
e    1
f    2
dtype: int64

In fact, besides looking at arrays, index functions like a fixed-size collection.
You can even use functions like in.

In [99]: 'year' in df.columns                                                                                                                                                                                
Out[99]: True

In [100]: 3 in df.index                                                                                                                                                                                      
Out[100]: True

In [101]: 5 in df.index                                                                                                                                                                                      
Out[101]: False

Here are some methods of index


Link to another index object (which must be an index object, not a list) to produce a new index object

In [103]: index1 = pd.Index([1,2,3,4])                                                                                                                                                                       

In [104]: index2 = pd.Index([4,5,6,7])                                                                                                                                                                       

In [105]: index1.append(index2)                                                                                                                                                                              
Out[105]: Int64Index([1, 2, 3, 4, 4, 5, 6, 7], dtype='int64')

Try the pleasure of reporting errors. We pass in a list.

In [106]: index1.append([4,5,6,7])                                                                                                                                                                           
TypeError                                 Traceback (most recent call last)
<ipython-input-106-00a73366f3f1> in <module>
----> 1 index1.append([4,5,6,7])

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/ in append(self, other)
   4008         for obj in to_concat:
   4009             if not isinstance(obj, Index):
-> 4010                 raise TypeError('all inputs must be Index')
   4012         names = { for obj in to_concat}

TypeError: all inputs must be Index

must be Index
OK really can't.


Computational communication, return an Index

In [112]: index1.intersection(index2)                                                                                                                                                                        
Out[112]: Int64Index([4], dtype='int64')



In [113]: index1.union(index2)                                                                                                                                                                               
Out[113]: Int64Index([1, 2, 3, 4, 5, 6, 7], dtype='int64')


Calculate whether an array of Boolean values is included in the parameter set

In [114]: index1.isin(index2)                                                                                                                                                                                
Out[114]: array([False, False, False,  True])


Delete the elements from index i to get a new index

In [116]: index1.delete(1)                                                                                                                                                                                   
Out[116]: Int64Index([1, 3, 4], dtype='int64')


Delete the value of the index name parameter and return the new index

In [118]: index1.drop(3)                                                                                                                                                                                     
Out[118]: Int64Index([1, 2, 4], dtype='int64')



In [124]: index1.insert(3,10)                                                                                                                                                                                
Out[124]: Int64Index([1, 2, 3, 10, 4], dtype='int64')


reindex is a very important method to reorder the original index according to the order of parameters.

In [126]: obj = pd.Series([4.5,7.2,-5.3,3.6],index=['d','b','a','c'])                                                                                                                                        

In [127]: obj                                                                                                                                                                                                
d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64

In [128]: obj2 = obj.reindex(['a','b','c','d','e'])                                                                                                                                                          

In [129]: obj2                                                                                                                                                                                               
a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

If an index value does not exist, a missing value is introduced.
Missing values can also be specified directly

In [130]: obj.reindex(['a','b','c','d','e'],fill_value=0)                                                                                                                                                    
a   -5.3
b    7.2
c    3.6
d    4.5
e    0.0
dtype: float64

For ordered data such as time series, some interpolation processing may be needed when re-indexing, which can be achieved by mothod option. Forward filling can be achieved with ffill

In [131]: obj3 = pd.Series(['tom','jack','ted'],index=[0,2,4])                                                                                                                                               

In [132]: obj3                                                                                                                                                                                               
0     tom
2    jack
4     ted
dtype: object

In [133]: obj3.reindex(range(6),method='ffill')                                                                                                                                                              
0     tom
1     tom
2    jack
3    jack
4     ted
5     ted
dtype: object

ffill is forward filling and bfill is backward filling

For the DataFrame, reindex can modify rows or columns or both.

In [137]: data = {'location':['beijing','hebei','tianjin','shandong'],'year':[2011,2013,2019,1998],'num':[1.4,1.8,-2.1,3.6]}                                                                                 

In [138]: df = pd.DataFrame(data)                                                                                                                                                                            

In [139]: df                                                                                                                                                                                                 
   location  year  num
0   beijing  2011  1.4
1     hebei  2013  1.8
2   tianjin  2019 -2.1
3  shandong  1998  3.6

In [150]: df.reindex(index=[3,2,1,0])                                                                                                                                                                        
   location  year  num
3  shandong  1998  3.6
2   tianjin  2019 -2.1
1     hebei  2013  1.8
0   beijing  2011  1.4

