Pandas Data Analysis - Chapter 3 (Index)

Posted by kmutz22 on Mon, 05 Aug 2019 10:24:29 +0200

Pandas Data Analysis - Chapter 3 (Index)

The index object of pandas manages week labels and other metadata, and any array or other sequence labels used to build Series or DataFrame poems are converted to an index.
For example

In [84]: obj = pd.Series(range(3),index=list('abc'))                                                                                                                                                         

In [85]: obj                                                                                                                                                                                                 
Out[85]: 
a    0
b    1
c    2
dtype: int64

In [86]: index = obj.index                                                                                                                                                                                   

In [87]: index                                                                                                                                                                                               
Out[87]: Index(['a', 'b', 'c'], dtype='object')

In [88]: index[1:]                                                                                                                                                                                           
Out[88]: Index(['b', 'c'], dtype='object')

In [89]: index[0] = 10                                                                                                                                                                                       
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-89-549e9c5948ff> in <module>
----> 1 index[0] = 10

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in __setitem__(self, key, value)
   3936 
   3937     def __setitem__(self, key, value):
-> 3938         raise TypeError("Index does not support mutable operations")
   3939 
   3940     def __getitem__(self, key):

TypeError: Index does not support mutable operations

We created a Serries named obj, whose index is ['a','b','c'], which can be sliced to take values, but cannot be modified directly.
But we can replace index directly.

In [90]: obj.index=['d','e','f',]                                                                                                                                                                            

In [91]: obj                                                                                                                                                                                                 
Out[91]: 
d    0
e    1
f    2
dtype: int64

In fact, besides looking at arrays, index functions like a fixed-size collection.
You can even use functions like in.

In [99]: 'year' in df.columns                                                                                                                                                                                
Out[99]: True

In [100]: 3 in df.index                                                                                                                                                                                      
Out[100]: True

In [101]: 5 in df.index                                                                                                                                                                                      
Out[101]: False

Here are some methods of index

append

Link to another index object (which must be an index object, not a list) to produce a new index object

In [103]: index1 = pd.Index([1,2,3,4])                                                                                                                                                                       

In [104]: index2 = pd.Index([4,5,6,7])                                                                                                                                                                       

In [105]: index1.append(index2)                                                                                                                                                                              
Out[105]: Int64Index([1, 2, 3, 4, 4, 5, 6, 7], dtype='int64')

Try the pleasure of reporting errors. We pass in a list.

In [106]: index1.append([4,5,6,7])                                                                                                                                                                           
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-106-00a73366f3f1> in <module>
----> 1 index1.append([4,5,6,7])

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in append(self, other)
   4008         for obj in to_concat:
   4009             if not isinstance(obj, Index):
-> 4010                 raise TypeError('all inputs must be Index')
   4011 
   4012         names = {obj.name for obj in to_concat}

TypeError: all inputs must be Index

must be Index
OK really can't.

intersection

Computational communication, return an Index

In [112]: index1.intersection(index2)                                                                                                                                                                        
Out[112]: Int64Index([4], dtype='int64')

union

Union

In [113]: index1.union(index2)                                                                                                                                                                               
Out[113]: Int64Index([1, 2, 3, 4, 5, 6, 7], dtype='int64')

isin

Calculate whether an array of Boolean values is included in the parameter set

In [114]: index1.isin(index2)                                                                                                                                                                                
Out[114]: array([False, False, False,  True])

delete

Delete the elements from index i to get a new index

In [116]: index1.delete(1)                                                                                                                                                                                   
Out[116]: Int64Index([1, 3, 4], dtype='int64')

drop

Delete the value of the index name parameter and return the new index

In [118]: index1.drop(3)                                                                                                                                                                                     
Out[118]: Int64Index([1, 2, 4], dtype='int64')

insert

insert

In [124]: index1.insert(3,10)                                                                                                                                                                                
Out[124]: Int64Index([1, 2, 3, 10, 4], dtype='int64')

reindex

reindex is a very important method to reorder the original index according to the order of parameters.

In [126]: obj = pd.Series([4.5,7.2,-5.3,3.6],index=['d','b','a','c'])                                                                                                                                        

In [127]: obj                                                                                                                                                                                                
Out[127]: 
d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64

In [128]: obj2 = obj.reindex(['a','b','c','d','e'])                                                                                                                                                          

In [129]: obj2                                                                                                                                                                                               
Out[129]: 
a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

If an index value does not exist, a missing value is introduced.
Missing values can also be specified directly

In [130]: obj.reindex(['a','b','c','d','e'],fill_value=0)                                                                                                                                                    
Out[130]: 
a   -5.3
b    7.2
c    3.6
d    4.5
e    0.0
dtype: float64

For ordered data such as time series, some interpolation processing may be needed when re-indexing, which can be achieved by mothod option. Forward filling can be achieved with ffill

In [131]: obj3 = pd.Series(['tom','jack','ted'],index=[0,2,4])                                                                                                                                               

In [132]: obj3                                                                                                                                                                                               
Out[132]: 
0     tom
2    jack
4     ted
dtype: object

In [133]: obj3.reindex(range(6),method='ffill')                                                                                                                                                              
Out[133]: 
0     tom
1     tom
2    jack
3    jack
4     ted
5     ted
dtype: object

ffill is forward filling and bfill is backward filling

For the DataFrame, reindex can modify rows or columns or both.

In [137]: data = {'location':['beijing','hebei','tianjin','shandong'],'year':[2011,2013,2019,1998],'num':[1.4,1.8,-2.1,3.6]}                                                                                 

In [138]: df = pd.DataFrame(data)                                                                                                                                                                            

In [139]: df                                                                                                                                                                                                 
Out[139]: 
   location  year  num
0   beijing  2011  1.4
1     hebei  2013  1.8
2   tianjin  2019 -2.1
3  shandong  1998  3.6

In [150]: df.reindex(index=[3,2,1,0])                                                                                                                                                                        
Out[150]: 
   location  year  num
3  shandong  1998  3.6
2   tianjin  2019 -2.1
1     hebei  2013  1.8
0   beijing  2011  1.4

Topics: IPython