Python Data Analysis of Pandas:Index Index Objects and Various Index Operations for Three Swordsmen

Posted by vdubdriver on Sun, 14 Jun 2020 02:41:52 +0200

CSDN Course Recommendation: "Python Data Analysis and Mining", Lecturer Liu Shunxiang, Master of Statistics, Zhejiang University of Industry and Commerce, Data Analyst, once held the post of Data Analyst in the Data Department of the Imperial Society, responsible for data analysis business in the payment link.We have worked with Lenovo, Hens, Net Fish Cafe and other enterprises on several enterprise-level projects.

Article Directory

This is a crawl-proof text that your readers will ignore.
This article was originally created in CSDN by TRHX.
Blog Home Page:https://itrhx.blog.csdn.net/
Links to this article:https://itrhx.blog.csdn.net/article/details/106698307
 Unauthorized, no reload!Malicious reload at your own risk!Respect the original, away from plagiarism!

[1] Index Index Object

Indexes in Series and DataFrame are Index objects, which are immutable to ensure data security and will error if you try to change the index. Common types of Indexes are: Index, Int64 Index, MultiIndex, DatetimeIndex.

The following code demonstrates the Index index object and its immutable properties:

>>> import pandas as pd
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj.index
Index(['a', 'b', 'c', 'd'], dtype='object')
>>> type(obj.index)
<class 'pandas.core.indexes.base.Index'>
>>> obj.index[0] = 'e'
Traceback (most recent call last):
  File "<pyshell#28>", line 1, in <module>
    obj.index[0] = 'e'
  File "C:\Users\...\base.py", line 3909, in __setitem__
    raise TypeError("Index does not support mutable operations")
TypeError: Index does not support mutable operations
Common properties of index index objects

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Index.html

attribute describe
T Transpose
array Array form of index, common Official Documents
dtype Returns the dtype object of the underlying data
hasnans Is there NaN (missing value)
inferred_type Returns a string representing the type of index
is_monotonic Determine if index is incremental
is_monotonic_decreasing Determine if index is monotonically decreasing
is_monotonic_increasing Determine if index monotonically increases
is_unique Does index have no duplicate values
nbytes Returns the number of bytes in index
ndim Dimension of index
nlevels Number of levels.
shape Returns a tuple representing the shape of the index
size Size of index
values Returns the value/array in index
>>> import pandas as pd
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj.index
Index(['a', 'b', 'c', 'd'], dtype='object')
>>> 
>>> obj.index.array
<PandasArray>
['a', 'b', 'c', 'd']
Length: 4, dtype: object
>>> 
>>> obj.index.dtype
dtype('O')
>>> 
>>> obj.index.hasnans
False
>>>
>>> obj.index.inferred_type
'string'
>>> 
>>> obj.index.is_monotonic
True
>>>
>>> obj.index.is_monotonic_decreasing
False
>>> 
>>> obj.index.is_monotonic_increasing
True
>>> 
>>> obj.index.is_unique
True
>>> 
>>> obj.index.nbytes
16
>>>
>>> obj.index.ndim
1
>>>
>>> obj.index.nlevels
1
>>>
>>> obj.index.shape
(4,)
>>> 
>>> obj.index.size
4
>>> 
>>> obj.index.values
array(['a', 'b', 'c', 'd'], dtype=object)
Common methods for index index objects

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Index.html

Method describe
all(self, *args, **kwargs) Determine if all elements are true, 0 will be considered False
any(self, *args, **kwargs) Determining whether at least one element is true, all 0, is considered False
append(self, other) Connect another index to produce a new index
argmax(self[, axis, skipna]) Returns the index value of the maximum value in index
argmin(self[, axis, skipna]) Returns the index value of the minimum value in the index
argsort(self, *args, **kwargs) Sort index from smallest to largest, returning the index value of the sorted element in the original index
delete(self, loc) Deletes the element at the specified index location, returning the deleted new index
difference(self, other[, sort]) In the first index delete the element in the second index, the difference set
drop(self, labels[, errors]) Delete the incoming value in the original index
drop_duplicates(self[, keep]) Remove duplicate values and keep parameter has the following optional values:
'first': keep the first occurrence of duplicates;
'last': keep the last occurrence of duplicates;
False: Do not keep duplicates
duplicated(self[, keep]) To determine whether it is a duplicate value, the keep parameter has the following optional values:
'first': the first repetition is False, the other is True;
'last': the last duplicate is False, the other is True;
False: All duplicates are True
dropna(self[, how]) Delete the missing value, NaN
fillna(self[, value, downcast]) Fill in the missing value with the specified value, NaN
equals(self, other) Determine if two index es are the same
insert(self, loc, item) Inserts an element into the specified index and returns a new index
intersection(self, other[, sort]) Returns the intersection of two index es
isna(self) Detect if the index element is a missing value, NaN
isnull(self) Detect if the index element is a missing value, NaN
max(self[, axis, skipna]) Returns the maximum value of index
min(self[, axis, skipna]) Returns the minimum value of index
union(self, other[, sort]) Returns the union of two index es
unique(self[, level]) Returns the unique value in index, equivalent to removing duplicate values
>>> import pandas as pd
>>> pd.Index([1, 2, 3]).all()
True
>>>
>>> pd.Index([0, 1, 2]).all()
False
>>> import pandas as pd
>>> pd.Index([0, 0, 1]).any()
True
>>>
>>> pd.Index([0, 0, 0]).any()
False
>>> import pandas as pd
>>> pd.Index(['a', 'b', 'c']).append(pd.Index([1, 2, 3]))
Index(['a', 'b', 'c', 1, 2, 3], dtype='object')
>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).argmax()
3
>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).argmin()
4
>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).argsort()
array([4, 1, 2, 0, 3], dtype=int32)
>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).delete(0)
Int64Index([2, 3, 9, 1], dtype='int64')
>>> import pandas as pd
>>> idx1 = pd.Index([2, 1, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).drop([2, 1])
Int64Index([5, 3, 9], dtype='int64')
>>> import pandas as pd
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx.drop_duplicates(keep='first')
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')
>>> idx.drop_duplicates(keep='last')
Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object')
>>> idx.drop_duplicates(keep=False)
Index(['cow', 'beetle', 'hippo'], dtype='object')
>>> import pandas as pd
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False,  True, False,  True])
>>> idx.duplicated(keep='first')
array([False, False,  True, False,  True])
>>> idx.duplicated(keep='last')
array([ True, False,  True, False, False])
>>> idx.duplicated(keep=False)
array([ True, False,  True, False,  True])
>>> import numpy as np
>>> import pandas as pd
>>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).dropna()
Float64Index([2.0, 5.0, 6.0], dtype='float64')
>>> import numpy as np
>>> import pandas as pd
>>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).fillna(5)
Float64Index([2.0, 5.0, 5.0, 6.0, 5.0, 5.0], dtype='float64')
>>> import pandas as pd
>>> idx1 = pd.Index([5, 2, 3, 9, 1])
>>> idx2 = pd.Index([5, 2, 3, 9, 1])
>>> idx1.equals(idx2)
True
>>> 
>>> idx1 = pd.Index([5, 2, 3, 9, 1])
>>> idx2 = pd.Index([5, 2, 4, 9, 1])
>>> idx1.equals(idx2)
False
>>> import pandas as pd
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')
>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).insert(2, 'A')
Index([5, 2, 'A', 3, 9, 1], dtype='object')
>>> import numpy as np
>>> import pandas as pd
>>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).isna()
array([False, False,  True, False,  True,  True])
>>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).isnull()
array([False, False,  True, False,  True,  True])
>>> import pandas as pd
>>> pd.Index([5, 2, 3, 9, 1]).max()
9
>>> pd.Index([5, 2, 3, 9, 1]).min()
1
>>> import pandas as pd
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.union(idx2)
Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')
>>> import pandas as pd
>>> pd.Index([5, 1, 3, 5, 1]).unique()
Int64Index([5, 1, 3], dtype='int64')

[2] Pandas General Index

Because there are some more advanced indexing operations in Pandas, such as re-indexing, hierarchical indexing, etc., the general tiled index, fancy index, Boolean index and so on are summarized as general indexes.

Series Index

[2.1.1]head() / tail()

Series.head() andSeries.tailThe first five and last five rows of data that can be obtained by the () method will get the specified rows if a parameter is passed into head() / tail():

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.Series(np.random.randn(8))
>>> obj
0   -0.643437
1   -0.365652
2   -0.966554
3   -0.036127
4    1.046095
5   -2.048362
6   -1.865551
7    1.344728
dtype: float64
>>> 
>>> obj.head()
0   -0.643437
1   -0.365652
2   -0.966554
3   -0.036127
4    1.046095
dtype: float64
>>> 
>>> obj.head(3)
0   -0.643437
1   -0.365652
2   -0.966554
dtype: float64
>>>
>>> obj.tail()
3    1.221221
4   -1.373496
5    1.032843
6    0.029734
7   -1.861485
dtype: float64
>>>
>>> obj.tail(3)
5    1.032843
6    0.029734
7   -1.861485
dtype: float64

[2.1.2] Row Index

Pandas can be indexed by location, by index name, or by Python dictionary expressions and methods to get values:

>>> import pandas as pd
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj
a    1
b    5
c   -8
d    2
dtype: int64
>>> obj['c']
-8
>>> obj[2]
-8
>>> 'b' in obj
True
>>> obj.keys()
Index(['a', 'b', 'c', 'd'], dtype='object')
>>> list(obj.items())
[('a', 1), ('b', 5), ('c', -8), ('d', 2)]

[2.1.3] Tile Index

There are two ways to slice: by location and by index name. Note: when slicing by location, there is no ending index; when slicing by index name, there is ending index.

>>> import pandas as pd
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj
a    1
b    5
c   -8
d    2
dtype: int64
>>>
>>> obj[1:3]
b    5
c   -8
dtype: int64
>>>
>>> obj[0:3:2]
a    1
c   -8
dtype: int64
>>>
>>> obj['b':'d']
b    5
c   -8
d    2
dtype: int64

[2.1.4] fancy index

The so-called fancy index is an interval index, a discontinuous index, passing a list of index names or location parameters to get multiple elements at once:

>>> import pandas as pd
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj
a    1
b    5
c   -8
d    2
dtype: int64
>>> 
>>> obj[[0, 2]]
a    1
c   -8
dtype: int64
>>> 
>>> obj[['a', 'c', 'd']]
a    1
c   -8
d    2
dtype: int64

[2.1.5] Boolean Index

The target array can be indexed by a Boolean array, that is, an array of elements that meet the specified criteria can be obtained by a Boolean operation such as a comparison operator.

>>> import pandas as pd
>>> obj = pd.Series([1, 5, -8, 2, -3], index=['a', 'b', 'c', 'd', 'e'])
>>> obj
a    1
b    5
c   -8
d    2
e   -3
dtype: int64
>>> 
>>> obj[obj > 0]
a    1
b    5
d    2
dtype: int64
>>> 
>>> obj > 0
a     True
b     True
c    False
d     True
e    False
dtype: bool

[2.2] DataFrame Index

[2.2.1]head() / tail()

Like Seres,DataFrame.head() andDataFrame.tailThe () method also gets the first five rows and the last five rows of the DataFrame, and if a parameter is passed into head() / tail(), the specified row is obtained:

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.DataFrame(np.random.randn(8,4), columns = ['a', 'b', 'c', 'd'])
>>> obj
          a         b         c         d
0 -1.399390  0.521596 -0.869613  0.506621
1 -0.748562 -0.364952  0.188399 -1.402566
2  1.378776 -1.476480  0.361635  0.451134
3 -0.206405 -1.188609  3.002599  0.563650
4  0.993289  1.133748  1.177549 -2.562286
5 -0.482157  1.069293  1.143983 -1.303079
6 -1.199154  0.220360  0.801838 -0.104533
7 -1.359816 -2.092035  2.003530 -0.151812
>>> 
>>> obj.head()
          a         b         c         d
0 -1.399390  0.521596 -0.869613  0.506621
1 -0.748562 -0.364952  0.188399 -1.402566
2  1.378776 -1.476480  0.361635  0.451134
3 -0.206405 -1.188609  3.002599  0.563650
4  0.993289  1.133748  1.177549 -2.562286
>>> 
>>> obj.head(3)
          a         b         c         d
0 -1.399390  0.521596 -0.869613  0.506621
1 -0.748562 -0.364952  0.188399 -1.402566
2  1.378776 -1.476480  0.361635  0.451134
>>>
>>> obj.tail()
          a         b         c         d
3 -0.206405 -1.188609  3.002599  0.563650
4  0.993289  1.133748  1.177549 -2.562286
5 -0.482157  1.069293  1.143983 -1.303079
6 -1.199154  0.220360  0.801838 -0.104533
7 -1.359816 -2.092035  2.003530 -0.151812
>>> 
>>> obj.tail(3)
          a         b         c         d
5 -0.482157  1.069293  1.143983 -1.303079
6 -1.199154  0.220360  0.801838 -0.104533
7 -1.359816 -2.092035  2.003530 -0.151812

[2.2.2] Column Index

The DataFrame can index columns by column labels:

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.DataFrame(np.random.randn(7,2), columns = ['a', 'b'])
>>> obj
          a         b
0 -1.198795  0.928378
1 -2.878230  0.014650
2  2.267475  0.370952
3  0.639340 -1.301041
4 -1.953444  0.148934
5 -0.445225  0.459632
6  0.097109 -2.592833
>>>
>>> obj['a']
0   -1.198795
1   -2.878230
2    2.267475
3    0.639340
4   -1.953444
5   -0.445225
6    0.097109
Name: a, dtype: float64
>>> 
>>> obj[['a']]
          a
0 -1.198795
1 -2.878230
2  2.267475
3  0.639340
4 -1.953444
5 -0.445225
6  0.097109
>>> 
>>> type(obj['a'])
<class 'pandas.core.series.Series'>
>>> type(obj[['a']])
<class 'pandas.core.frame.DataFrame'>

[2.2.3] Tile Index

Slice indexes in a DataFrame operate on rows, and there are two ways to slice: by location and by index name. Note: when slicing by location, there is no termination index; when slicing by index name, there is termination index.

>>> import pandas as pd
>>> import numpy as np
>>> data = np.random.randn(5,4)
>>> index = ['I1', 'I2', 'I3', 'I4', 'I5']
>>> columns = ['a', 'b', 'c', 'd']
>>> obj = pd.DataFrame(data, index, columns)
>>> obj
           a         b         c         d
I1  0.828676 -1.663337  1.753632  1.432487
I2  0.368138  0.222166  0.902764 -1.436186
I3  2.285615 -2.415175 -1.344456 -0.502214
I4  3.224288 -0.500268  1.293596 -1.235549
I5 -0.938833 -0.804433 -0.170047 -0.566766
>>> 
>>> obj[0:3]
           a         b         c         d
I1  0.828676 -1.663337  1.753632  1.432487
I2  0.368138  0.222166  0.902764 -1.436186
I3  2.285615 -2.415175 -1.344456 -0.502214
>>>
>>> obj[0:4:2]
           a         b         c         d
I1 -0.042168  1.437354 -1.114545  0.830790
I3  0.241506  0.018984 -0.499151 -1.190143
>>>
>>> obj['I2':'I4']
           a         b         c         d
I2  0.368138  0.222166  0.902764 -1.436186
I3  2.285615 -2.415175 -1.344456 -0.502214
I4  3.224288 -0.500268  1.293596 -1.235549

[2.2.4] fancy index

Like Series, the so-called fancy index is an interval index, a discontinuous index, passing a list of column names to get multiple column elements at once:

>>> import pandas as pd
>>> import numpy as np
>>> data = np.random.randn(5,4)
>>> index = ['I1', 'I2', 'I3', 'I4', 'I5']
>>> columns = ['a', 'b', 'c', 'd']
>>> obj = pd.DataFrame(data, index, columns)
>>> obj
           a         b         c         d
I1 -1.083223 -0.182874 -0.348460 -1.572120
I2 -0.205206 -0.251931  1.180131  0.847720
I3 -0.980379  0.325553 -0.847566 -0.882343
I4 -0.638228 -0.282882 -0.624997 -0.245980
I5 -0.229769  1.002930 -0.226715 -0.916591
>>> 
>>> obj[['a', 'd']]
           a         d
I1 -1.083223 -1.572120
I2 -0.205206  0.847720
I3 -0.980379 -0.882343
I4 -0.638228 -0.245980
I5 -0.229769 -0.916591

[2.2.5] Boolean Index

The target array can be indexed by a Boolean array, that is, an array of elements that meet the specified criteria can be obtained by a Boolean operation such as a comparison operator.

>>> import pandas as pd
>>> import numpy as np
>>> data = np.random.randn(5,4)
>>> index = ['I1', 'I2', 'I3', 'I4', 'I5']
>>> columns = ['a', 'b', 'c', 'd']
>>> obj = pd.DataFrame(data, index, columns)
>>> obj
           a         b         c         d
I1 -0.602984 -0.135716  0.999689 -0.339786
I2  0.911130 -0.092485 -0.914074 -0.279588
I3  0.849606 -0.420055 -1.240389 -0.179297
I4  0.249986 -1.250668  0.329416 -1.105774
I5 -0.743816  0.430647 -0.058126 -0.337319
>>> 
>>> obj[obj > 0]
           a         b         c   d
I1       NaN       NaN  0.999689 NaN
I2  0.911130       NaN       NaN NaN
I3  0.849606       NaN       NaN NaN
I4  0.249986       NaN  0.329416 NaN
I5       NaN  0.430647       NaN NaN
>>> 
>>> obj > 0
        a      b      c      d
I1  False  False   True  False
I2   True  False  False  False
I3   True  False  False  False
I4   True  False   True  False
I5  False   True  False  False
This is a crawl-proof text that your readers will ignore.
This article was originally created in CSDN by TRHX.
Blog Home Page:https://itrhx.blog.csdn.net/
Links to this article:https://itrhx.blog.csdn.net/article/details/106698307
 Unauthorized, no reload!Malicious reload at your own risk!Respect the original, away from plagiarism!

[3] Indexer: loc and iloc

loc is a label index and iloc is a location index. Note that before Pandas 1.0.0 there was an ix method (i.e., by label or by location index) that was removed after Pandas 1.0.0.

[3.1] loc Tag Index

loc tag index, which selects data based on index and columns.

[3.1.1]Series.loc

In Seres, allow input:

  • A single tag, such as 5 or'a'(note that 5 is the name of the index, not the location index);
  • Tag list or array, such as ['a','b','c'];
  • A slice object with labels, such as'a':'f'.

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Series.loc.html

>>> import pandas as np
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj
a    1
b    5
c   -8
d    2
dtype: int64
>>> 
>>> obj.loc['a']
1
>>> 
>>> obj.loc['a':'c']
a    1
b    5
c   -8
dtype: int64
>>>
>>> obj.loc[['a', 'd']]
a    1
d    2
dtype: int64

[3.1.2]DataFrame.loc

In a DataFrame, the first parameter indexes the rows, and the second parameter is the index columns, allowing the format of the input to be much the same as that of Seres.

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html

>>> import pandas as pd
>>> obj = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=['a', 'b', 'c'], columns=['A', 'B', 'C'])
>>> obj
   A  B  C
a  1  2  3
b  4  5  6
c  7  8  9
>>> 
>>> obj.loc['a']
A    1
B    2
C    3
Name: a, dtype: int64
>>> 
>>> obj.loc['a':'c']
   A  B  C
a  1  2  3
b  4  5  6
c  7  8  9
>>> 
>>> obj.loc[['a', 'c']]
   A  B  C
a  1  2  3
c  7  8  9
>>> 
>>> obj.loc['b', 'B']
5
>>> obj.loc['b', 'A':'C']
A    4
B    5
C    6
Name: b, dtype: int64

[3.2] iloc location index

Like loc, it is only indexed based on the number of the index, that is, the location number of the index and columns to select the data.

[3.2.1]Series.iloc

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Series.iloc.html

In Seres, allow input:

  • Integers, such as 5;
  • A list or array of integers, such as [4, 3, 0];
  • Slice objects with integers, such as 1:7.
>>> import pandas as np
>>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd'])
>>> obj
a    1
b    5
c   -8
d    2
dtype: int64
>>> 
>>> obj.iloc[1]
5
>>> 
>>> obj.iloc[0:2]
a    1
b    5
dtype: int64
>>> 
>>> obj.iloc[[0, 1, 3]]
a    1
b    5
d    2
dtype: int64

[3.2.2]DataFrame.iloc

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html

In a DataFrame, the first parameter indexes rows, and the second parameter indexes columns, allowing input formats much like Seres:

>>> import pandas as pd
>>> obj = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=['a', 'b', 'c'], columns=['A', 'B', 'C'])
>>> obj
   A  B  C
a  1  2  3
b  4  5  6
c  7  8  9
>>> 
>>> obj.iloc[1]
A    4
B    5
C    6
Name: b, dtype: int64
>>> 
>>> obj.iloc[0:2]
   A  B  C
a  1  2  3
b  4  5  6
>>> 
>>> obj.iloc[[0, 2]]
   A  B  C
a  1  2  3
c  7  8  9
>>> 
>>> obj.iloc[1, 2]
6
>>> 
>>> obj.iloc[1, 0:2]
A    4
B    5
Name: b, dtype: int64

[4] Pandas re-index

An important method of Pandas objects is reindex, which creates a new object whose data conforms to the new index.with DataFrame.reindex For example (Series is similar), the basic syntax is as follows:

DataFrame.reindex(self, labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)

Some parameters are described as follows: (See full parameter explanation Official Documents)

parameter describe
index The new sequence used as an index can be either an instance of index or a Python data structure of other sequential types
method Interpolation (filling) method with the following values:
None: Do not fill in gaps;
pad / ffill: Propagates the last valid observation forward to the next valid observation;
backfill / bfill: Use the next valid observation to fill in the gap;
nearest: Use the most recent valid observations to fill in the gap.
fill_value During re-indexing, you need to introduce alternative values to use when missing values are missing
limit Maximum Fill Size When Filling Forward or Backward
tolerance Maximum spacing (absolute distance) between filled inaccurate matches when filling forward or backward
level Match a simple index at the specified level of Multlndex, or select a subset of it
copy The default is True, which copies anyway; if False, the old and new are equal and not copied

reindex will be rearranged based on the new index.If an index value does not currently exist, a missing value is introduced:

>>> import pandas as pd
>>> obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
>>> obj
d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64
>>> 
>>> obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
>>> obj2
a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

For ordered data such as time series, some interpolation may be required when re-indexing.The method option does this, for example, by using ffill s to populate forward values:

>>> import pandas as pd
>>> obj = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
>>> obj
0      blue
2    purple
4    yellow
dtype: object
>>> 
>>> obj2 = obj.reindex(range(6), method='ffill')
>>> obj2
0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object

With the DataFrame, reindex can modify (row) indexes and columns.When only one sequence is passed, the resulting rows are re-indexed:

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California'])
>>> obj
   Ohio  Texas  California
a     0      1           2
c     3      4           5
d     6      7           8
>>> 
>>> obj2 = obj.reindex(['a', 'b', 'c', 'd'])
>>> obj2
   Ohio  Texas  California
a   0.0    1.0         2.0
b   NaN    NaN         NaN
c   3.0    4.0         5.0
d   6.0    7.0         8.0

Columns can be re-indexed with the columns keyword:

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California'])
>>> obj
   Ohio  Texas  California
a     0      1           2
c     3      4           5
d     6      7           8
>>> 
>>> states = ['Texas', 'Utah', 'California']
>>> obj.reindex(columns=states)
   Texas  Utah  California
a      1   NaN           2
c      4   NaN           5
d      7   NaN           8
This is a crawl-proof text that your readers will ignore.
This article was originally created in CSDN by TRHX.
Blog Home Page:https://itrhx.blog.csdn.net/
Links to this article:https://itrhx.blog.csdn.net/article/details/106698307
 Unauthorized, no reload!Malicious reload at your own risk!Respect the original, away from plagiarism!

Topics: Python Attribute