CSDN Course Recommendation: "Python Data Analysis and Mining", Lecturer Liu Shunxiang, Master of Statistics, Zhejiang University of Industry and Commerce, Data Analyst, once held the post of Data Analyst in the Data Department of the Imperial Society, responsible for data analysis business in the payment link.We have worked with Lenovo, Hens, Net Fish Cafe and other enterprises on several enterprise-level projects.
Article Directory
This is a crawl-proof text that your readers will ignore. This article was originally created in CSDN by TRHX. Blog Home Page:https://itrhx.blog.csdn.net/ Links to this article:https://itrhx.blog.csdn.net/article/details/106698307 Unauthorized, no reload!Malicious reload at your own risk!Respect the original, away from plagiarism!
[1] Index Index Object
Indexes in Series and DataFrame are Index objects, which are immutable to ensure data security and will error if you try to change the index. Common types of Indexes are: Index, Int64 Index, MultiIndex, DatetimeIndex.
The following code demonstrates the Index index object and its immutable properties:
>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj.index Index(['a', 'b', 'c', 'd'], dtype='object') >>> type(obj.index) <class 'pandas.core.indexes.base.Index'> >>> obj.index[0] = 'e' Traceback (most recent call last): File "<pyshell#28>", line 1, in <module> obj.index[0] = 'e' File "C:\Users\...\base.py", line 3909, in __setitem__ raise TypeError("Index does not support mutable operations") TypeError: Index does not support mutable operations
Common properties of index index objects |
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Index.html
attribute | describe |
---|---|
T | Transpose |
array | Array form of index, common Official Documents |
dtype | Returns the dtype object of the underlying data |
hasnans | Is there NaN (missing value) |
inferred_type | Returns a string representing the type of index |
is_monotonic | Determine if index is incremental |
is_monotonic_decreasing | Determine if index is monotonically decreasing |
is_monotonic_increasing | Determine if index monotonically increases |
is_unique | Does index have no duplicate values |
nbytes | Returns the number of bytes in index |
ndim | Dimension of index |
nlevels | Number of levels. |
shape | Returns a tuple representing the shape of the index |
size | Size of index |
values | Returns the value/array in index |
>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj.index Index(['a', 'b', 'c', 'd'], dtype='object') >>> >>> obj.index.array <PandasArray> ['a', 'b', 'c', 'd'] Length: 4, dtype: object >>> >>> obj.index.dtype dtype('O') >>> >>> obj.index.hasnans False >>> >>> obj.index.inferred_type 'string' >>> >>> obj.index.is_monotonic True >>> >>> obj.index.is_monotonic_decreasing False >>> >>> obj.index.is_monotonic_increasing True >>> >>> obj.index.is_unique True >>> >>> obj.index.nbytes 16 >>> >>> obj.index.ndim 1 >>> >>> obj.index.nlevels 1 >>> >>> obj.index.shape (4,) >>> >>> obj.index.size 4 >>> >>> obj.index.values array(['a', 'b', 'c', 'd'], dtype=object)
Common methods for index index objects |
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Index.html
Method | describe |
---|---|
all(self, *args, **kwargs) | Determine if all elements are true, 0 will be considered False |
any(self, *args, **kwargs) | Determining whether at least one element is true, all 0, is considered False |
append(self, other) | Connect another index to produce a new index |
argmax(self[, axis, skipna]) | Returns the index value of the maximum value in index |
argmin(self[, axis, skipna]) | Returns the index value of the minimum value in the index |
argsort(self, *args, **kwargs) | Sort index from smallest to largest, returning the index value of the sorted element in the original index |
delete(self, loc) | Deletes the element at the specified index location, returning the deleted new index |
difference(self, other[, sort]) | In the first index delete the element in the second index, the difference set |
drop(self, labels[, errors]) | Delete the incoming value in the original index |
drop_duplicates(self[, keep]) | Remove duplicate values and keep parameter has the following optional values: 'first': keep the first occurrence of duplicates; 'last': keep the last occurrence of duplicates; False: Do not keep duplicates |
duplicated(self[, keep]) | To determine whether it is a duplicate value, the keep parameter has the following optional values: 'first': the first repetition is False, the other is True; 'last': the last duplicate is False, the other is True; False: All duplicates are True |
dropna(self[, how]) | Delete the missing value, NaN |
fillna(self[, value, downcast]) | Fill in the missing value with the specified value, NaN |
equals(self, other) | Determine if two index es are the same |
insert(self, loc, item) | Inserts an element into the specified index and returns a new index |
intersection(self, other[, sort]) | Returns the intersection of two index es |
isna(self) | Detect if the index element is a missing value, NaN |
isnull(self) | Detect if the index element is a missing value, NaN |
max(self[, axis, skipna]) | Returns the maximum value of index |
min(self[, axis, skipna]) | Returns the minimum value of index |
union(self, other[, sort]) | Returns the union of two index es |
unique(self[, level]) | Returns the unique value in index, equivalent to removing duplicate values |
- all(self, *args, **kwargs) [Official Documents]
>>> import pandas as pd >>> pd.Index([1, 2, 3]).all() True >>> >>> pd.Index([0, 1, 2]).all() False
- any(self, *args, **kwargs) [Official Documents]
>>> import pandas as pd >>> pd.Index([0, 0, 1]).any() True >>> >>> pd.Index([0, 0, 0]).any() False
- append(self, other) [Official Documents]
>>> import pandas as pd >>> pd.Index(['a', 'b', 'c']).append(pd.Index([1, 2, 3])) Index(['a', 'b', 'c', 1, 2, 3], dtype='object')
- argmax(self[, axis, skipna]) [Official Documents]
>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).argmax() 3
- argmin(self[, axis, skipna]) [Official Documents]
>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).argmin() 4
- argsort(self, *args, **kwargs) [Official Documents]
>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).argsort() array([4, 1, 2, 0, 3], dtype=int32)
- delete(self, loc) [Official Documents]
>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).delete(0) Int64Index([2, 3, 9, 1], dtype='int64')
- difference(self, other[, sort]) [Official Documents]
>>> import pandas as pd >>> idx1 = pd.Index([2, 1, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')
- drop(self, labels[, errors]) [Official Documents]
>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).drop([2, 1]) Int64Index([5, 3, 9], dtype='int64')
- drop_duplicates(self[, keep]) [Official Documents]
>>> import pandas as pd >>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo']) >>> idx.drop_duplicates(keep='first') Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object') >>> idx.drop_duplicates(keep='last') Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object') >>> idx.drop_duplicates(keep=False) Index(['cow', 'beetle', 'hippo'], dtype='object')
- duplicated(self[, keep]) [Official Documents]
>>> import pandas as pd >>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama']) >>> idx.duplicated() array([False, False, True, False, True]) >>> idx.duplicated(keep='first') array([False, False, True, False, True]) >>> idx.duplicated(keep='last') array([ True, False, True, False, False]) >>> idx.duplicated(keep=False) array([ True, False, True, False, True])
- dropna(self[, how]) [Official Documents]
>>> import numpy as np >>> import pandas as pd >>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).dropna() Float64Index([2.0, 5.0, 6.0], dtype='float64')
- fillna(self[, value, downcast]) [Official Documents]
>>> import numpy as np >>> import pandas as pd >>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).fillna(5) Float64Index([2.0, 5.0, 5.0, 6.0, 5.0, 5.0], dtype='float64')
- equals(self, other) [Official Documents]
>>> import pandas as pd >>> idx1 = pd.Index([5, 2, 3, 9, 1]) >>> idx2 = pd.Index([5, 2, 3, 9, 1]) >>> idx1.equals(idx2) True >>> >>> idx1 = pd.Index([5, 2, 3, 9, 1]) >>> idx2 = pd.Index([5, 2, 4, 9, 1]) >>> idx1.equals(idx2) False
- intersection(self, other[, sort]) [Official Documents]
>>> import pandas as pd >>> idx1 = pd.Index([1, 2, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.intersection(idx2) Int64Index([3, 4], dtype='int64')
- insert(self, loc, item) [Official Documents]
>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).insert(2, 'A') Index([5, 2, 'A', 3, 9, 1], dtype='object')
- isna(self) [Official Documents],isnull(self) [Official Documents]
>>> import numpy as np >>> import pandas as pd >>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).isna() array([False, False, True, False, True, True]) >>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).isnull() array([False, False, True, False, True, True])
- max(self[, axis, skipna]) [Official Documents],min(self[, axis, skipna]) [Official Documents]
>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).max() 9 >>> pd.Index([5, 2, 3, 9, 1]).min() 1
- union(self, other[, sort]) [Official Documents]
>>> import pandas as pd >>> idx1 = pd.Index([1, 2, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.union(idx2) Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')
- unique(self[, level]) [Official Documents]
>>> import pandas as pd >>> pd.Index([5, 1, 3, 5, 1]).unique() Int64Index([5, 1, 3], dtype='int64')
[2] Pandas General Index
Because there are some more advanced indexing operations in Pandas, such as re-indexing, hierarchical indexing, etc., the general tiled index, fancy index, Boolean index and so on are summarized as general indexes.
Series Index
[2.1.1]head() / tail()
Series.head() andSeries.tailThe first five and last five rows of data that can be obtained by the () method will get the specified rows if a parameter is passed into head() / tail():
>>> import pandas as pd >>> import numpy as np >>> obj = pd.Series(np.random.randn(8)) >>> obj 0 -0.643437 1 -0.365652 2 -0.966554 3 -0.036127 4 1.046095 5 -2.048362 6 -1.865551 7 1.344728 dtype: float64 >>> >>> obj.head() 0 -0.643437 1 -0.365652 2 -0.966554 3 -0.036127 4 1.046095 dtype: float64 >>> >>> obj.head(3) 0 -0.643437 1 -0.365652 2 -0.966554 dtype: float64 >>> >>> obj.tail() 3 1.221221 4 -1.373496 5 1.032843 6 0.029734 7 -1.861485 dtype: float64 >>> >>> obj.tail(3) 5 1.032843 6 0.029734 7 -1.861485 dtype: float64
[2.1.2] Row Index
Pandas can be indexed by location, by index name, or by Python dictionary expressions and methods to get values:
>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> obj['c'] -8 >>> obj[2] -8 >>> 'b' in obj True >>> obj.keys() Index(['a', 'b', 'c', 'd'], dtype='object') >>> list(obj.items()) [('a', 1), ('b', 5), ('c', -8), ('d', 2)]
[2.1.3] Tile Index
There are two ways to slice: by location and by index name. Note: when slicing by location, there is no ending index; when slicing by index name, there is ending index.
>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> >>> obj[1:3] b 5 c -8 dtype: int64 >>> >>> obj[0:3:2] a 1 c -8 dtype: int64 >>> >>> obj['b':'d'] b 5 c -8 d 2 dtype: int64
[2.1.4] fancy index
The so-called fancy index is an interval index, a discontinuous index, passing a list of index names or location parameters to get multiple elements at once:
>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> >>> obj[[0, 2]] a 1 c -8 dtype: int64 >>> >>> obj[['a', 'c', 'd']] a 1 c -8 d 2 dtype: int64
[2.1.5] Boolean Index
The target array can be indexed by a Boolean array, that is, an array of elements that meet the specified criteria can be obtained by a Boolean operation such as a comparison operator.
>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2, -3], index=['a', 'b', 'c', 'd', 'e']) >>> obj a 1 b 5 c -8 d 2 e -3 dtype: int64 >>> >>> obj[obj > 0] a 1 b 5 d 2 dtype: int64 >>> >>> obj > 0 a True b True c False d True e False dtype: bool
[2.2] DataFrame Index
[2.2.1]head() / tail()
Like Seres,DataFrame.head() andDataFrame.tailThe () method also gets the first five rows and the last five rows of the DataFrame, and if a parameter is passed into head() / tail(), the specified row is obtained:
>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.random.randn(8,4), columns = ['a', 'b', 'c', 'd']) >>> obj a b c d 0 -1.399390 0.521596 -0.869613 0.506621 1 -0.748562 -0.364952 0.188399 -1.402566 2 1.378776 -1.476480 0.361635 0.451134 3 -0.206405 -1.188609 3.002599 0.563650 4 0.993289 1.133748 1.177549 -2.562286 5 -0.482157 1.069293 1.143983 -1.303079 6 -1.199154 0.220360 0.801838 -0.104533 7 -1.359816 -2.092035 2.003530 -0.151812 >>> >>> obj.head() a b c d 0 -1.399390 0.521596 -0.869613 0.506621 1 -0.748562 -0.364952 0.188399 -1.402566 2 1.378776 -1.476480 0.361635 0.451134 3 -0.206405 -1.188609 3.002599 0.563650 4 0.993289 1.133748 1.177549 -2.562286 >>> >>> obj.head(3) a b c d 0 -1.399390 0.521596 -0.869613 0.506621 1 -0.748562 -0.364952 0.188399 -1.402566 2 1.378776 -1.476480 0.361635 0.451134 >>> >>> obj.tail() a b c d 3 -0.206405 -1.188609 3.002599 0.563650 4 0.993289 1.133748 1.177549 -2.562286 5 -0.482157 1.069293 1.143983 -1.303079 6 -1.199154 0.220360 0.801838 -0.104533 7 -1.359816 -2.092035 2.003530 -0.151812 >>> >>> obj.tail(3) a b c d 5 -0.482157 1.069293 1.143983 -1.303079 6 -1.199154 0.220360 0.801838 -0.104533 7 -1.359816 -2.092035 2.003530 -0.151812
[2.2.2] Column Index
The DataFrame can index columns by column labels:
>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.random.randn(7,2), columns = ['a', 'b']) >>> obj a b 0 -1.198795 0.928378 1 -2.878230 0.014650 2 2.267475 0.370952 3 0.639340 -1.301041 4 -1.953444 0.148934 5 -0.445225 0.459632 6 0.097109 -2.592833 >>> >>> obj['a'] 0 -1.198795 1 -2.878230 2 2.267475 3 0.639340 4 -1.953444 5 -0.445225 6 0.097109 Name: a, dtype: float64 >>> >>> obj[['a']] a 0 -1.198795 1 -2.878230 2 2.267475 3 0.639340 4 -1.953444 5 -0.445225 6 0.097109 >>> >>> type(obj['a']) <class 'pandas.core.series.Series'> >>> type(obj[['a']]) <class 'pandas.core.frame.DataFrame'>
[2.2.3] Tile Index
Slice indexes in a DataFrame operate on rows, and there are two ways to slice: by location and by index name. Note: when slicing by location, there is no termination index; when slicing by index name, there is termination index.
>>> import pandas as pd >>> import numpy as np >>> data = np.random.randn(5,4) >>> index = ['I1', 'I2', 'I3', 'I4', 'I5'] >>> columns = ['a', 'b', 'c', 'd'] >>> obj = pd.DataFrame(data, index, columns) >>> obj a b c d I1 0.828676 -1.663337 1.753632 1.432487 I2 0.368138 0.222166 0.902764 -1.436186 I3 2.285615 -2.415175 -1.344456 -0.502214 I4 3.224288 -0.500268 1.293596 -1.235549 I5 -0.938833 -0.804433 -0.170047 -0.566766 >>> >>> obj[0:3] a b c d I1 0.828676 -1.663337 1.753632 1.432487 I2 0.368138 0.222166 0.902764 -1.436186 I3 2.285615 -2.415175 -1.344456 -0.502214 >>> >>> obj[0:4:2] a b c d I1 -0.042168 1.437354 -1.114545 0.830790 I3 0.241506 0.018984 -0.499151 -1.190143 >>> >>> obj['I2':'I4'] a b c d I2 0.368138 0.222166 0.902764 -1.436186 I3 2.285615 -2.415175 -1.344456 -0.502214 I4 3.224288 -0.500268 1.293596 -1.235549
[2.2.4] fancy index
Like Series, the so-called fancy index is an interval index, a discontinuous index, passing a list of column names to get multiple column elements at once:
>>> import pandas as pd >>> import numpy as np >>> data = np.random.randn(5,4) >>> index = ['I1', 'I2', 'I3', 'I4', 'I5'] >>> columns = ['a', 'b', 'c', 'd'] >>> obj = pd.DataFrame(data, index, columns) >>> obj a b c d I1 -1.083223 -0.182874 -0.348460 -1.572120 I2 -0.205206 -0.251931 1.180131 0.847720 I3 -0.980379 0.325553 -0.847566 -0.882343 I4 -0.638228 -0.282882 -0.624997 -0.245980 I5 -0.229769 1.002930 -0.226715 -0.916591 >>> >>> obj[['a', 'd']] a d I1 -1.083223 -1.572120 I2 -0.205206 0.847720 I3 -0.980379 -0.882343 I4 -0.638228 -0.245980 I5 -0.229769 -0.916591
[2.2.5] Boolean Index
The target array can be indexed by a Boolean array, that is, an array of elements that meet the specified criteria can be obtained by a Boolean operation such as a comparison operator.
>>> import pandas as pd >>> import numpy as np >>> data = np.random.randn(5,4) >>> index = ['I1', 'I2', 'I3', 'I4', 'I5'] >>> columns = ['a', 'b', 'c', 'd'] >>> obj = pd.DataFrame(data, index, columns) >>> obj a b c d I1 -0.602984 -0.135716 0.999689 -0.339786 I2 0.911130 -0.092485 -0.914074 -0.279588 I3 0.849606 -0.420055 -1.240389 -0.179297 I4 0.249986 -1.250668 0.329416 -1.105774 I5 -0.743816 0.430647 -0.058126 -0.337319 >>> >>> obj[obj > 0] a b c d I1 NaN NaN 0.999689 NaN I2 0.911130 NaN NaN NaN I3 0.849606 NaN NaN NaN I4 0.249986 NaN 0.329416 NaN I5 NaN 0.430647 NaN NaN >>> >>> obj > 0 a b c d I1 False False True False I2 True False False False I3 True False False False I4 True False True False I5 False True False False
This is a crawl-proof text that your readers will ignore. This article was originally created in CSDN by TRHX. Blog Home Page:https://itrhx.blog.csdn.net/ Links to this article:https://itrhx.blog.csdn.net/article/details/106698307 Unauthorized, no reload!Malicious reload at your own risk!Respect the original, away from plagiarism!
[3] Indexer: loc and iloc
loc is a label index and iloc is a location index. Note that before Pandas 1.0.0 there was an ix method (i.e., by label or by location index) that was removed after Pandas 1.0.0.
[3.1] loc Tag Index
loc tag index, which selects data based on index and columns.
[3.1.1]Series.loc
In Seres, allow input:
- A single tag, such as 5 or'a'(note that 5 is the name of the index, not the location index);
- Tag list or array, such as ['a','b','c'];
- A slice object with labels, such as'a':'f'.
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Series.loc.html
>>> import pandas as np >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> >>> obj.loc['a'] 1 >>> >>> obj.loc['a':'c'] a 1 b 5 c -8 dtype: int64 >>> >>> obj.loc[['a', 'd']] a 1 d 2 dtype: int64
[3.1.2]DataFrame.loc
In a DataFrame, the first parameter indexes the rows, and the second parameter is the index columns, allowing the format of the input to be much the same as that of Seres.
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
>>> import pandas as pd >>> obj = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=['a', 'b', 'c'], columns=['A', 'B', 'C']) >>> obj A B C a 1 2 3 b 4 5 6 c 7 8 9 >>> >>> obj.loc['a'] A 1 B 2 C 3 Name: a, dtype: int64 >>> >>> obj.loc['a':'c'] A B C a 1 2 3 b 4 5 6 c 7 8 9 >>> >>> obj.loc[['a', 'c']] A B C a 1 2 3 c 7 8 9 >>> >>> obj.loc['b', 'B'] 5 >>> obj.loc['b', 'A':'C'] A 4 B 5 C 6 Name: b, dtype: int64
[3.2] iloc location index
Like loc, it is only indexed based on the number of the index, that is, the location number of the index and columns to select the data.
[3.2.1]Series.iloc
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Series.iloc.html
In Seres, allow input:
- Integers, such as 5;
- A list or array of integers, such as [4, 3, 0];
- Slice objects with integers, such as 1:7.
>>> import pandas as np >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> >>> obj.iloc[1] 5 >>> >>> obj.iloc[0:2] a 1 b 5 dtype: int64 >>> >>> obj.iloc[[0, 1, 3]] a 1 b 5 d 2 dtype: int64
[3.2.2]DataFrame.iloc
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html
In a DataFrame, the first parameter indexes rows, and the second parameter indexes columns, allowing input formats much like Seres:
>>> import pandas as pd >>> obj = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=['a', 'b', 'c'], columns=['A', 'B', 'C']) >>> obj A B C a 1 2 3 b 4 5 6 c 7 8 9 >>> >>> obj.iloc[1] A 4 B 5 C 6 Name: b, dtype: int64 >>> >>> obj.iloc[0:2] A B C a 1 2 3 b 4 5 6 >>> >>> obj.iloc[[0, 2]] A B C a 1 2 3 c 7 8 9 >>> >>> obj.iloc[1, 2] 6 >>> >>> obj.iloc[1, 0:2] A 4 B 5 Name: b, dtype: int64
[4] Pandas re-index
An important method of Pandas objects is reindex, which creates a new object whose data conforms to the new index.with DataFrame.reindex For example (Series is similar), the basic syntax is as follows:
DataFrame.reindex(self, labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)
Some parameters are described as follows: (See full parameter explanation Official Documents)
parameter | describe |
---|---|
index | The new sequence used as an index can be either an instance of index or a Python data structure of other sequential types |
method | Interpolation (filling) method with the following values: None: Do not fill in gaps; pad / ffill: Propagates the last valid observation forward to the next valid observation; backfill / bfill: Use the next valid observation to fill in the gap; nearest: Use the most recent valid observations to fill in the gap. |
fill_value | During re-indexing, you need to introduce alternative values to use when missing values are missing |
limit | Maximum Fill Size When Filling Forward or Backward |
tolerance | Maximum spacing (absolute distance) between filled inaccurate matches when filling forward or backward |
level | Match a simple index at the specified level of Multlndex, or select a subset of it |
copy | The default is True, which copies anyway; if False, the old and new are equal and not copied |
reindex will be rearranged based on the new index.If an index value does not currently exist, a missing value is introduced:
>>> import pandas as pd >>> obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c']) >>> obj d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64 >>> >>> obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e']) >>> obj2 a -5.3 b 7.2 c 3.6 d 4.5 e NaN dtype: float64
For ordered data such as time series, some interpolation may be required when re-indexing.The method option does this, for example, by using ffill s to populate forward values:
>>> import pandas as pd >>> obj = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4]) >>> obj 0 blue 2 purple 4 yellow dtype: object >>> >>> obj2 = obj.reindex(range(6), method='ffill') >>> obj2 0 blue 1 blue 2 purple 3 purple 4 yellow 5 yellow dtype: object
With the DataFrame, reindex can modify (row) indexes and columns.When only one sequence is passed, the resulting rows are re-indexed:
>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California']) >>> obj Ohio Texas California a 0 1 2 c 3 4 5 d 6 7 8 >>> >>> obj2 = obj.reindex(['a', 'b', 'c', 'd']) >>> obj2 Ohio Texas California a 0.0 1.0 2.0 b NaN NaN NaN c 3.0 4.0 5.0 d 6.0 7.0 8.0
Columns can be re-indexed with the columns keyword:
>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California']) >>> obj Ohio Texas California a 0 1 2 c 3 4 5 d 6 7 8 >>> >>> states = ['Texas', 'Utah', 'California'] >>> obj.reindex(columns=states) Texas Utah California a 1 NaN 2 c 4 NaN 5 d 7 NaN 8
This is a crawl-proof text that your readers will ignore. This article was originally created in CSDN by TRHX. Blog Home Page:https://itrhx.blog.csdn.net/ Links to this article:https://itrhx.blog.csdn.net/article/details/106698307 Unauthorized, no reload!Malicious reload at your own risk!Respect the original, away from plagiarism!