CSDN Course recommendation: towards data scientist: take you to play Python data analysis, lecturer Qi Wei, CTO of Suzhou Yantu Education Technology Co., Ltd., member of the master's Steering Committee of applied statistics major of Suzhou University; published "learn python with the old: easy introduction", "learn python with the old: Django real battle", "learn python with the old: data analysis" and "Python University real life" Use tutorial "best seller.".
Pandas series (updating...) :
- Pandas of Python data analysis three swordsmen (1): understanding pandas and its Series and DataFrame objects
- Pandas of Python data analysis (2): Index object and various Index operations
- Pandas (3): arithmetic operation and missing value processing
- Pandas (4): function application, mapping, sorting and hierarchical index
- Pandas (5): statistical calculation and description
- Pandas (6): GroupBy data splitting, application and merging
- Pandas (7): merging datasets
- Pandas (8): data reconstruction, repetitive data processing and data replacement
- Pandas (9): time series
In addition, NumPy and Matplotlib series of articles have been updated, welcome to follow:
- NumPy series: https://itrhx.blog.csdn.net/category_9780393.html
- Matplotlib series: https://itrhx.blog.csdn.net/category_9780418.html
Recommended learning materials and websites (bloggers participate in translation of some documents):
- NumPy official Chinese website: https://www.numpy.org.cn/
- Pandas official Chinese website: https://www.pypandas.cn/
- Matplotlib official Chinese website: https://www.matplotlib.org.cn/
- Quick reference table of NumPy, Matplotlib and Pandas: https://github.com/TRHX/Python-quick-reference-table
Article catalog
- [01x00] time series
- [02x00] Timestamp
- [02x01]pandas.Timestamp
- [02x02] value of freq frequency
- [02x03]to_datetime
- [02x04]date_range
- [02x05] index and section
- [02x06] mobile data and data offset
- [02x07] time zone processing
- [03x00] fixed period
- [03x01]pandas.Period
- [03x02]period_range
- [03x03] asfreq period frequency conversion
- [03x04]to_period and to_timestamp()
- [04x00] timedelta interval
- [05x00] resampling and frequency conversion
This is an anti crawler text, please ignore. This paper was first published in CSDN by TRHX. Blog homepage: https://itrhx.blog.csdn.net/ Link to this article: https://itrhx.blog.csdn.net/article/details/106947061 Unauthorized, no reprint! Reprint maliciously at your own risk! Respect originality and keep away from plagiarism!
[01x00] time series
Introduction to time series on the official website: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
Time series is an important form of structured data, which is applied in many fields, including finance, economics, ecology, neuroscience, physics, etc. Anything observed or measured at multiple time points can form a time series. Many time series are fixed frequency, that is to say, data points appear regularly according to certain rules (such as every 15 seconds, every 5 minutes, every month). Time series can also be irregular, with no fixed time unit or offset between units. The significance of time series data depends on specific application scenarios, mainly including the following:
-
timestamp, indicating a specific time point, such as 15:30 on June 24, 2020;
-
Fixed period, indicating a time period, such as 2020-01;
-
Time delta, duration, the difference between two dates or times.
-
For Timestamp data, panda provides the Timestamp type. It's essentially an alternative to Python's native datetime type, but with better performance numpy.datetime64 Type based creation. The corresponding index data structure is DatetimeIndex.
-
For time Period data, Pandas provides the Period type. This is the use of numpy.datetime64 Type encodes a fixed frequency time interval. The corresponding index data structure is PeriodIndex.
-
Pandas provides a timedelta type for time increments or durations. Timedelta is a native alternative to Python datetime.timedelta Type of high-performance data structure, also based on numpy.timedelta64 Type. The corresponding index data structure is timedelta index.
[02x00] Timestamp
[02x01]pandas.Timestamp
In pandas, pandas.Timestamp Method to replace the datetime.datetime method.
Timestamp is equivalent to Python's Datetime and can be interchanged in most cases. This type is used to make up DatetimeIndex and other time series oriented data structures in Pandas.
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html
Basic syntax:
class pandas.Timestamp(ts_input=<object object>, freq=None, tz=None, unit=None, year=None, month=None, day=None, hour=None, minute=None, second=None, microsecond=None, nanosecond=None, tzinfo=None)
Common parameters:
parameter | describe |
---|---|
ts_input | The object to be converted to a timestamp can be of type datetime like, str, int, float |
freq | The offset that the time stamp will have can be str, and the date offset type. Please refer to[02x02] value of freq frequency |
tz | Time zone the timestamp will have |
unit | If ts_input is an integer or floating-point number, which is used to set its unit (D, s, ms, us, ns) |
Simple example:
>>> import pandas as pd >>> pd.Timestamp('2017-01-01T12') Timestamp('2017-01-01 12:00:00')
Set unit = s, that is, the unit of the object to be converted is second:
>>> import pandas as pd >>> pd.Timestamp(1513393355.5, unit='s') Timestamp('2017-12-16 03:02:35.500000')
Use the tz parameter to set the time zone:
>>> import pandas as pd >>> pd.Timestamp(1513393355, unit='s', tz='US/Pacific') Timestamp('2017-12-15 19:02:35-0800', tz='US/Pacific')
Set date separately:
>>> import pandas as pd >>> pd.Timestamp(year=2020, month=6, day=24, hour=12) Timestamp('2020-06-24 12:00:00')
[02x02] value of freq frequency
For complete values, please refer to official documents: https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases
parameter | type | describe |
---|---|---|
D | Day | Daily calendar day |
B | BusinessDay | Every working day |
H | Hour | Hourly |
T or min | Minute | Per cent |
S | Second | Per second |
L or ms | Milli | Every millisecond (i.e. every thousandth of a second) |
U | Micro | Per microsecond (i.e. per millionth of a second) |
M | MonthEnd | Last calendar day of the month |
BM | BusinessMonthEnd | Last working day of the month |
MS | MonthBegin | First calendar day of each month |
BMS | BusinessMonthBegin | First working day of each month |
W-MON,W-TUE... | Week | From the specified day of week (MON, TUE, WED, THU, FR, SAT, SUN), every week |
WoM-1MON,WOM-2MON... | WeekOfMonth | The day of the week that produces the first, second, third, or fourth week of the month. For example, WoM-3FRI represents the third Friday of each month |
Q-JAN,Q-FEB... | QuarterEnd | For the year ending in the specified month (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC), the last calendar day of the last month of each quarter |
BQ-JAN,BQ-FEB... | BusinessQuarterEnd | For a year ending in a specified month, the last working day of the last month of each quarter |
QS-JAN,QS-FEB... | QuarterBegin | For a year ending in a specified month, the first calendar day of the last month of each quarter |
BQS-JAN, BQS-FEB... | BusinessQuarterBegin | For a year ending in a specified month, the first working day of the last month of each quarter |
A-JAN,A-FEB... | YearEnd | The last calendar day of the specified month (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC) every year |
BA-JAN,BA-FEB... | BusinessYearEnd | Last working day of the specified month of each year |
AS-JAN,AS-FEB... | YearBegin | The first calendar day of the specified month of each year |
BAS-JAN,BAS-FEB... | BusinessYearBegin | The first working day of the specified month of each year |
[02x03]to_datetime
In Python, the datetime library provides date and time processing methods. Using str or strftime methods, you can convert the datetime object into a string. For specific usage, see [Python standard library learning] date and time processing library - datetime.
>>> from datetime import datetime >>> stamp = datetime(2020, 6, 24) >>> stamp datetime.datetime(2020, 6, 24, 0, 0) >>> >>> str(stamp) '2020-06-24 00:00:00' >>> >>> stamp.strftime('%Y-%m-%d') '2020-06-24'
In panda to_ The datetime method can parse a string into a variety of different Timestamp objects:
>>> import pandas as pd >>> datestrs = '2011-07-06 12:00:00' >>> type(datestrs) <class 'str'> >>> >>> pd.to_datetime(datestrs) Timestamp('2011-07-06 12:00:00')
Basic syntax:
pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
Common parameters:
parameter | describe |
---|---|
arg | To convert objects to date time, you can accept int, float, STR, datetime, list, tuple, 1-D array, series dataframe / dict like types |
errors | If the string does not meet the form of time stamp, whether an exception will occur ignore: return original input without exception; raise: invalid resolution will throw exception (default); coerce: invalid resolution will be set to NaT |
dayfirst | bool type, default False, if arg is str or list, whether to resolve to date first For example, dayfirst is True, 10 / 11 / 12 is interpreted as 2012-11-10, and False is interpreted as 2012-10-11 |
yearfirst | bool type, default False, if arg is str or list, whether to resolve to year first For example, dayfirst is True, 10 / 11 / 12 is resolved to 2010-11-12, and False is resolved to 2012-10-11 If dayfirst and yearfirst are both True, then yearfirst takes precedence |
utc | bool type. When it is converted to coordination world or not, the default is None |
format | Format time, such as 21 / 2 / 20 16:10 using% D /% m /% Y% H:% m will be resolved to 2020-02-21 16:10:00 Common articles on symbolic meaning: [Python standard library learning] date and time processing library - datetime perhaps Official documents |
exact | If True, an exact format match is required. If False, the format is allowed to match anywhere in the target string |
unit | If arg is an integer or floating-point number, this parameter is used to set its unit (D, s, ms, us, ns) |
Simple application:
>>> import pandas as pd >>> obj = pd.DataFrame({'year': [2015, 2016], 'month': [2, 3], 'day': [4, 5]}) >>> obj year month day 0 2015 2 4 1 2016 3 5 >>> >>> pd.to_datetime(obj) 0 2015-02-04 1 2016-03-05 dtype: datetime64[ns]
Set the format and errors parameters:
>>> import pandas as pd >>> pd.to_datetime('13000101', format='%Y%m%d', errors='ignore') datetime.datetime(1300, 1, 1, 0, 0) >>> >>> pd.to_datetime('13000101', format='%Y%m%d', errors='coerce') NaT >>> >>> pd.to_datetime('13000101', format='%Y%m%d', errors='raise') Traceback (most recent call last): ... pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1300-01-01 00:00:00
Set the unit parameter:
>>> import pandas as pd >>> pd.to_datetime(1490195805, unit='s') Timestamp('2017-03-22 15:16:45') >>> >>> pd.to_datetime(1490195805433502912, unit='ns') Timestamp('2017-03-22 15:16:45.433502912')
[02x04]date_range
pandas.date_ The range method can be used to generate a DatetimeIndex of a specified length based on a specified frequency.
Basic syntax:
pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs) → pandas.core.indexes.datetimes.DatetimeIndex
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.date_range.html
parameter | describe |
---|---|
start | Start date |
end | End date |
periods | int type, number of periods to generate (days) |
freq | Frequency string, that is, generate date according to a specific frequency. For the value, see[02x02] value of freq frequency |
tz | Set the time zone, for example, "Asia/Hong_Kong” |
normalize | bool type, default False, whether to normalize it before the generation date (only keep the date) |
name | Name of the result DatetimeIndex |
closed |
None: the default value, keeping both the start date and the end date 'left': keep start date, don't keep end date 'right': keep end date, do not keep start date |
Simple example:
>>> import pandas as pd >>> pd.date_range(start='1/1/2018', end='1/08/2018') DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04', '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'], dtype='datetime64[ns]', freq='D')
Specify the periods parameter:
>>> import pandas as pd >>> pd.date_range(start='2012-04-01', periods=20) DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04', '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08', '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12', '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16', '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'], dtype='datetime64[ns]', freq='D') >>> >>> pd.date_range(end='2012-06-01', periods=20) DatetimeIndex(['2012-05-13', '2012-05-14', '2012-05-15', '2012-05-16', '2012-05-17', '2012-05-18', '2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22', '2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26', '2012-05-27', '2012-05-28', '2012-05-29', '2012-05-30', '2012-05-31', '2012-06-01'], dtype='datetime64[ns]', freq='D') >>> >>> pd.date_range(start='2018-04-24', end='2018-04-27', periods=3) DatetimeIndex(['2018-04-24 00:00:00', '2018-04-25 12:00:00', '2018-04-27 00:00:00'], dtype='datetime64[ns]', freq=None) >>> >>> pd.date_range(start='2018-04-24', end='2018-04-28', periods=3) DatetimeIndex(['2018-04-24', '2018-04-26', '2018-04-28'], dtype='datetime64[ns]', freq=None)
Specify that freq='M 'will generate the date according to the frequency of the last calendar day of each month, and specify that freq='3M' will generate the date according to the frequency of the last calendar day of each month every three months:
>>> import pandas as pd >>> pd.date_range(start='1/1/2018', periods=5, freq='M') DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30', '2018-05-31'], dtype='datetime64[ns]', freq='M') >>> >>> pd.date_range(start='1/1/2018', periods=5, freq='3M') DatetimeIndex(['2018-01-31', '2018-04-30', '2018-07-31', '2018-10-31', '2019-01-31'], dtype='datetime64[ns]', freq='3M') >>>
Use the tz parameter to set the time zone:
>>> import pandas as pd >>> pd.date_range(start='1/1/2018', periods=5, tz='Asia/Tokyo') DatetimeIndex(['2018-01-01 00:00:00+09:00', '2018-01-02 00:00:00+09:00', '2018-01-03 00:00:00+09:00', '2018-01-04 00:00:00+09:00', '2018-01-05 00:00:00+09:00'], dtype='datetime64[ns, Asia/Tokyo]', freq='D') >>> >>> pd.date_range(start='6/24/2020', periods=5, tz='Asia/Hong_Kong') DatetimeIndex(['2020-06-24 00:00:00+08:00', '2020-06-25 00:00:00+08:00', '2020-06-26 00:00:00+08:00', '2020-06-27 00:00:00+08:00', '2020-06-28 00:00:00+08:00'], dtype='datetime64[ns, Asia/Hong_Kong]', freq='D')
Set the normalize parameter to format the timestamps before they are generated:
>>> import pandas as pd >>> pd.date_range('2020-06-24 12:56:31', periods=5, normalize=True) DatetimeIndex(['2020-06-24', '2020-06-25', '2020-06-26', '2020-06-27', '2020-06-28'], dtype='datetime64[ns]', freq='D')
Set the closed parameter:
>>> import pandas as pd >>> pd.date_range(start='2020-06-20', end='2020-06-24', closed=None) DatetimeIndex(['2020-06-20', '2020-06-21', '2020-06-22', '2020-06-23', '2020-06-24'], dtype='datetime64[ns]', freq='D') >>> >>> pd.date_range(start='2020-06-20', end='2020-06-24', closed='left') DatetimeIndex(['2020-06-20', '2020-06-21', '2020-06-22', '2020-06-23'], dtype='datetime64[ns]', freq='D') >>> >>> pd.date_range(start='2020-06-20', end='2020-06-24', closed='right') DatetimeIndex(['2020-06-21', '2020-06-22', '2020-06-23', '2020-06-24'], dtype='datetime64[ns]', freq='D')
[02x05] index and section
The most basic time series type of Pandas is a series indexed by a time stamp (usually represented by a Python string or datatime object). These datetime objects are actually placed in the DatetimeIndex. You can use similar pandas.Series Object is indexed by its slice method:
>>> import pandas as pd >>> import numpy as np >>> dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7), datetime(2011, 1, 8), datetime(2011, 1, 10), datetime(2011, 1, 12)] >>> obj = pd.Series(np.random.randn(6), index=dates) >>> >>> obj 2011-01-02 -0.407110 2011-01-05 -0.186661 2011-01-07 -0.731080 2011-01-08 0.860970 2011-01-10 1.929973 2011-01-12 -0.168599 dtype: float64 >>> >>> obj.index DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08', '2011-01-10', '2011-01-12'], dtype='datetime64[ns]', freq=None) >>> >>> obj.index[0] Timestamp('2011-01-02 00:00:00') >>> >>> obj.index[0:3] DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07'], dtype='datetime64[ns]', freq=None)
In addition, you can pass in a string that can be interpreted as a date, or simply pass in "year" or "month" to easily select slices of data:
>>> import pandas as pd >>> import numpy as np >>> obj = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000)) >>> obj 2000-01-01 -1.142284 2000-01-02 1.198785 2000-01-03 2.466909 2000-01-04 -0.086728 2000-01-05 -0.978437 ... 2002-09-22 -0.252240 2002-09-23 0.148561 2002-09-24 -1.330409 2002-09-25 -0.673471 2002-09-26 -0.253271 Freq: D, Length: 1000, dtype: float64 >>> >>> obj['26/9/2002'] -0.25327100684233356 >>> >>> obj['2002'] 2002-01-01 1.058715 2002-01-02 0.900859 2002-01-03 1.993508 2002-01-04 -0.103211 2002-01-05 -0.950090 ... 2002-09-22 -0.252240 2002-09-23 0.148561 2002-09-24 -1.330409 2002-09-25 -0.673471 2002-09-26 -0.253271 Freq: D, Length: 269, dtype: float64 >>> >>> obj['2002-09'] 2002-09-01 -0.995528 2002-09-02 0.501528 2002-09-03 -0.486753 2002-09-04 -1.083906 2002-09-05 1.458975 2002-09-06 -1.331685 2002-09-07 0.195338 2002-09-08 -0.429613 2002-09-09 1.125823 2002-09-10 1.607051 2002-09-11 0.530387 2002-09-12 -0.015938 2002-09-13 1.781043 2002-09-14 -0.277123 2002-09-15 0.344569 2002-09-16 -1.010810 2002-09-17 0.463001 2002-09-18 1.883636 2002-09-19 0.274520 2002-09-20 0.624184 2002-09-21 -1.203057 2002-09-22 -0.252240 2002-09-23 0.148561 2002-09-24 -1.330409 2002-09-25 -0.673471 2002-09-26 -0.253271 Freq: D, dtype: float64 >>> >>> obj['20/9/2002':'26/9/2002'] 2002-09-20 0.624184 2002-09-21 -1.203057 2002-09-22 -0.252240 2002-09-23 0.148561 2002-09-24 -1.330409 2002-09-25 -0.673471 2002-09-26 -0.253271 Freq: D, dtype: float64
[02x06] mobile data and data offset
shifting refers to moving data forward or backward along the timeline. Both Series and DataFrame have a shift method to perform simple forward or backward operations, keeping the index unchanged:
>>> import pandas as pd >>> import numpy as np >>> obj = pd.Series(np.random.randn(4), index=pd.date_range('1/1/2000', periods=4, freq='M')) >>> obj 2000-01-31 -0.100217 2000-02-29 1.177834 2000-03-31 -0.644353 2000-04-30 -1.954679 Freq: M, dtype: float64 >>> >>> obj.shift(2) 2000-01-31 NaN 2000-02-29 NaN 2000-03-31 -0.100217 2000-04-30 1.177834 Freq: M, dtype: float64 >>> >>> obj.shift(-2) 2000-01-31 -0.644353 2000-02-29 -1.954679 2000-03-31 NaN 2000-04-30 NaN Freq: M, dtype: float64
Because a simple shift operation does not modify the index, part of the data is discarded and NaN (missing value) is introduced. Therefore, if the frequency is known, it can be passed to shift so as to realize the displacement of time stamp instead of simple displacement of data:
>>> import pandas as pd >>> import numpy as np >>> obj = pd.Series(np.random.randn(4), index=pd.date_range('1/1/2000', periods=4, freq='M')) >>> obj 2000-01-31 -0.100217 2000-02-29 1.177834 2000-03-31 -0.644353 2000-04-30 -1.954679 Freq: M, dtype: float64 >>> >>> obj.shift(2, freq='M') 2000-03-31 -0.100217 2000-04-30 1.177834 2000-05-31 -0.644353 2000-06-30 -1.954679 Freq: M, dtype: float64
The frequency in Pandas consists of a base frequency and a multiplier. The base frequency is usually represented by a string alias, such as "M" for each month and "H" for each Hour. For each base frequency, there is an object called date offset. For example, the frequency calculated by Hour can be represented by the Hour class:
>>> from pandas.tseries.offsets import Hour, Minute >>> hour = Hour() >>> hour <Hour> >>> >>> four_hours = Hour(4) >>> four_hours <4 * Hours>
In general, you don't need to explicitly create such an object, just use a string alias such as "H" or "4H". Put an integer in front of the basic frequency to create a multiple:
>>> import pandas as pd >>> pd.date_range('2000-01-01', '2000-01-03 23:59', freq='4h') DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00', '2000-01-01 08:00:00', '2000-01-01 12:00:00', '2000-01-01 16:00:00', '2000-01-01 20:00:00', '2000-01-02 00:00:00', '2000-01-02 04:00:00', '2000-01-02 08:00:00', '2000-01-02 12:00:00', '2000-01-02 16:00:00', '2000-01-02 20:00:00', '2000-01-03 00:00:00', '2000-01-03 04:00:00', '2000-01-03 08:00:00', '2000-01-03 12:00:00', '2000-01-03 16:00:00', '2000-01-03 20:00:00'], dtype='datetime64[ns]', freq='4H')
Most offset objects can be connected by addition:
>>> from pandas.tseries.offsets import Hour, Minute >>> Hour(2) + Minute(30) <150 * Minutes>
For the freq parameter, you can also pass in a frequency string (such as "2h30min"), which can be efficiently parsed into an equivalent expression:
>>> import pandas as pd >>> pd.date_range('2000-01-01', periods=10, freq='1h30min') DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:30:00', '2000-01-01 03:00:00', '2000-01-01 04:30:00', '2000-01-01 06:00:00', '2000-01-01 07:30:00', '2000-01-01 09:00:00', '2000-01-01 10:30:00', '2000-01-01 12:00:00', '2000-01-01 13:30:00'], dtype='datetime64[ns]', freq='90T')
This offset can also be used on datetime or Timestamp objects:
>>> from pandas.tseries.offsets import Day, MonthEnd >>> now = datetime(2011, 11, 17) >>> now + 3 * Day() Timestamp('2011-11-20 00:00:00')
If you add an anchor offset, such as MonthEnd, the first increment will scroll the original date forward to the next date that meets the frequency rule:
>>> from pandas.tseries.offsets import Day, MonthEnd >>> now = datetime(2011, 11, 17) >>> now + MonthEnd() Timestamp('2011-11-30 00:00:00') >>> now + MonthEnd(2) Timestamp('2011-12-31 00:00:00')
Through the rollforward and rollback methods of anchor offset, the date can be scrolled forward or backward explicitly:
>>> from pandas.tseries.offsets import Day, MonthEnd >>> now = datetime(2011, 11, 17) >>> offset = MonthEnd() >>> offset.rollforward(now) Timestamp('2011-11-30 00:00:00') >>> offset.rollback(now) Timestamp('2011-10-31 00:00:00')
In combination with the groupby method:
>>> import pandas as pd >>> import numpy as np >>> from pandas.tseries.offsets import Day, MonthEnd >>> obj = pd.Series(np.random.randn(20), index=pd.date_range('1/15/2000', periods=20, freq='4d')) >>> obj 2000-01-15 -0.591729 2000-01-19 -0.775844 2000-01-23 -0.745603 2000-01-27 -0.076439 2000-01-31 1.796417 2000-02-04 -0.500349 2000-02-08 0.515851 2000-02-12 -0.344171 2000-02-16 0.419657 2000-02-20 0.307288 2000-02-24 0.115113 2000-02-28 -0.362585 2000-03-03 1.074892 2000-03-07 1.111366 2000-03-11 0.949910 2000-03-15 -1.535727 2000-03-19 0.545944 2000-03-23 -0.810139 2000-03-27 -1.260627 2000-03-31 -0.128403 Freq: 4D, dtype: float64 >>> >>> offset = MonthEnd() >>> obj.groupby(offset.rollforward).mean() 2000-01-31 -0.078640 2000-02-29 0.021543 2000-03-31 -0.006598 dtype: float64
[02x07] time zone processing
In Python, the time zone information comes from the third-party library pytz, using the pytz.common_ The timezones method can view all time zone names using the pytz.timezone Method to get the time zone object from pytz:
>>> import pytz >>> pytz.common_timezones ['Africa/Abidjan', 'Africa/Accra', 'Africa/Addis_Ababa', ..., 'UTC'] >>> >>> tz = pytz.timezone('Asia/Shanghai') >>> tz <DstTzInfo 'Asia/Shanghai' LMT+8:06:00 STD> # It means that the time difference between UTC and UTC is 8 hours and 6 minutes
On date_ In the range method, the tz parameter is used to specify the time zone. The default value is None. You can use tz_ The localize method converts it to a localized time zone. In the following example, the non time zone is converted to a localized UTC time zone:
>>> import pandas as pd >>> import numpy as np >>> rng = pd.date_range('3/9/2012 9:30', periods=6, freq='D') >>> ts = pd.Series(np.random.randn(len(rng)), index=rng) >>> ts 2012-03-09 09:30:00 -1.527913 2012-03-10 09:30:00 -1.116101 2012-03-11 09:30:00 0.359358 2012-03-12 09:30:00 -0.475920 2012-03-13 09:30:00 -0.336570 2012-03-14 09:30:00 -1.075952 Freq: D, dtype: float64 >>> >>> print(ts.index.tz) None >>> >>> ts_utc = ts.tz_localize('UTC') >>> ts_utc 2012-03-09 09:30:00+00:00 -1.527913 2012-03-10 09:30:00+00:00 -1.116101 2012-03-11 09:30:00+00:00 0.359358 2012-03-12 09:30:00+00:00 -0.475920 2012-03-13 09:30:00+00:00 -0.336570 2012-03-14 09:30:00+00:00 -1.075952 Freq: D, dtype: float64 >>> >>> ts_utc.index DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00', '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00', '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00'], dtype='datetime64[ns, UTC]', freq='D')
After time series are localized to a specific time zone, TZ can be used_ The convert method converts it to a different time zone:
>>> import pandas as pd >>> import numpy as np >>> rng = pd.date_range('3/9/2012 9:30', periods=6, freq='D') >>> ts = pd.Series(np.random.randn(len(rng)), index=rng) >>> ts 2012-03-09 09:30:00 0.480303 2012-03-10 09:30:00 -1.461039 2012-03-11 09:30:00 -1.512749 2012-03-12 09:30:00 -2.185421 2012-03-13 09:30:00 1.657845 2012-03-14 09:30:00 0.175633 Freq: D, dtype: float64 >>> >>> ts.tz_localize('UTC').tz_convert('Asia/Shanghai') 2012-03-09 17:30:00+08:00 0.480303 2012-03-10 17:30:00+08:00 -1.461039 2012-03-11 17:30:00+08:00 -1.512749 2012-03-12 17:30:00+08:00 -2.185421 2012-03-13 17:30:00+08:00 1.657845 2012-03-14 17:30:00+08:00 0.175633 Freq: D, dtype: float64
This is an anti crawler text, please ignore. This paper was first published in CSDN by TRHX. Blog homepage: https://itrhx.blog.csdn.net/ Link to this article: https://itrhx.blog.csdn.net/article/details/106947061 Unauthorized, no reprint! Reprint maliciously at your own risk! Respect originality and keep away from plagiarism!
[03x00] fixed period
[03x01]pandas.Period
Fixed Period refers to time interval, such as days, months, quarters, years, etc. The Period class represents this data type, and its constructor needs to use a string or integer.
Basic syntax:
class pandas.Period(value=None, freq=None, ordinal=None, year=None, month=None, quarter=None, day=None, hour=None, minute=None, second=None)
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Period.html
Common parameters:
parameter | describe |
---|---|
value | time slot |
freq | The offset that the time stamp will have can be str, and the date offset type. Please refer to[02x02] value of freq frequency |
In the following example, the Period object represents the entire Period from January 1, 2020 to December 31, 2020
>>> import pandas as pd >>> pd.Period(2020, freq='A-DEC') Period('2020', 'A-DEC')
The displacement is carried out according to the frequency by the addition and subtraction method
>>> import pandas as pd >>> obj = pd.Period(2020, freq='A-DEC') >>> obj Period('2020', 'A-DEC') >>> >>> obj + 5 Period('2025', 'A-DEC') >>> >>> obj - 5 Period('2015', 'A-DEC')
The PeriodIndex class holds a set of periods that can be used as axis indexes in any panda data structure:
>>> import pandas as pd >>> import numpy as np >>> rng = [pd.Period('2000-01'), pd.Period('2000-02'), pd.Period('2000-03'), pd.Period('2000-04'), pd.Period('2000-05'), pd.Period('2000-06')] >>> obj = pd.Series(np.random.randn(6), index=rng) >>> obj 2000-01 0.229092 2000-02 1.515498 2000-03 -0.334401 2000-04 -0.492681 2000-05 -2.012818 2000-06 0.338804 Freq: M, dtype: float64 >>> >>> obj.index PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]', freq='M')
>>> import pandas as pd >>> values = ['2001Q3', '2002Q2', '2003Q1'] >>> index = pd.PeriodIndex(values, freq='Q-DEC') >>> index PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='period[Q-DEC]', freq='Q-DEC') >>>
[03x02]period_range
pandas.period_ The range method generates a PeriodIndex of a specified length based on a specified frequency.
Basic syntax:
pandas.period_range(start=None, end=None, periods=None, freq=None, name=None) → pandas.core.indexes.period.PeriodIndex
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.period_range.html
Common parameters:
parameter | describe |
---|---|
start | Start date |
end | End date |
periods | Number of periods to generate |
freq | The offset that the time stamp will have can be str, and the date offset type. Please refer to[02x02] value of freq frequency |
name | Result PeriodIndex object name |
Simple application:
>>> import pandas as pd >>> pd.period_range(start='2019-01-01', end='2020-01-01', freq='M') PeriodIndex(['2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2019-12', '2020-01'], dtype='period[M]', freq='M') >>> >>> pd.period_range(start=pd.Period('2017Q1', freq='Q'), end=pd.Period('2017Q2', freq='Q'), freq='M') PeriodIndex(['2017-03', '2017-04', '2017-05', '2017-06'], dtype='period[M]', freq='M')
[03x03] asfreq period frequency conversion
Both Period and PeriodIndex objects can be converted to different frequencies by the asfreq method.
Basic syntax: PeriodIndex.asfreq(self, *args, **kwargs)
Common parameters:
parameter | describe |
---|---|
freq | For new frequency (offset), see[02x02] value of freq frequency |
how | Align by start or end, 'E' or 'END' or 'FINISH'; 's' or' start 'or' begin ' |
Application example:
>>> import pandas as pd >>> pidx = pd.period_range('2010-01-01', '2015-01-01', freq='A') >>> pidx PeriodIndex(['2010', '2011', '2012', '2013', '2014', '2015'], dtype='period[A-DEC]', freq='A-DEC') >>> >>> pidx.asfreq('M') PeriodIndex(['2010-12', '2011-12', '2012-12', '2013-12', '2014-12', '2015-12'], dtype='period[M]', freq='M') >>> >>> pidx.asfreq('M', how='S') PeriodIndex(['2010-01', '2011-01', '2012-01', '2013-01', '2014-01', '2015-01'], dtype='period[M]', freq='M')
[03x04]to_period and to_timestamp()
To_ The Period method can convert a Timestamp to a Period;
To_ The Timestamp method converts a Period to a Timestamp.
>>> import pandas as pd >>> rng = pd.date_range('2000-01-01', periods=3, freq='M') >>> ts = pd.Series(np.random.randn(3), index=rng) >>> ts 2000-01-31 0.220759 2000-02-29 -0.108221 2000-03-31 0.819433 Freq: M, dtype: float64 >>> >>> pts = ts.to_period() >>> pts 2000-01 0.220759 2000-02 -0.108221 2000-03 0.819433 Freq: M, dtype: float64 >>> >>> pts2 = pts.to_timestamp() >>> pts2 2000-01-01 0.220759 2000-02-01 -0.108221 2000-03-01 0.819433 Freq: MS, dtype: float64 >>> >>> ts.index DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31'], dtype='datetime64[ns]', freq='M') >>> >>> pts.index PeriodIndex(['2000-01', '2000-02', '2000-03'], dtype='period[M]', freq='M') >>> >>> pts2.index DatetimeIndex(['2000-01-01', '2000-02-01', '2000-03-01'], dtype='datetime64[ns]', freq='MS')
[04x00] timedelta interval
[04x01]pandas.Timedelta
Timedelta is the duration, the difference between two dates or times.
Timedelta is equivalent to Python's datetime.timedelta In most cases, the two can be interchanged.
Basic syntax: Class pandas.Timedelta (value=<object object>, unit=None, **kwargs)
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html
Common parameters:
parameter | describe |
---|---|
value | The value passed in can be timedelta, timedelta, np.timedelta64 , string or integer object |
unit | The unit used to set the value. Please refer to the official document for the specific value |
Represents the time difference between two datetime objects:
>>> import pandas as pd >>> pd.to_datetime('2020-6-24') - pd.to_datetime('2016-1-1') Timedelta('1636 days 00:00:00')
Pass parameters through string:
>>> import pandas as pd >>> pd.Timedelta('3 days 3 hours 3 minutes 30 seconds') Timedelta('3 days 03:03:30')
Pass parameters by integer:
>>> import pandas as pd >>> pd.Timedelta(5,unit='h') Timedelta('0 days 05:00:00')
Get properties:
>>> import pandas as pd >>> obj = pd.Timedelta('3 days 3 hours 3 minutes 30 seconds') >>> obj Timedelta('3 days 03:03:30') >>> >>> obj.days 3 >>> obj.seconds 11010
[04x02]to_timedelta
To_ The timedelta method converts the incoming object to a timedelta object.
Basic syntax: pandas.to_timedelta(arg, unit='ns', errors='raise')
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.to_timedelta.html
Common parameters:
parameter | describe |
---|---|
arg | Objects to be converted to timedelta can be str, timedelta, list like, or Series objects |
unit | It is used to set the unit of arg. Please refer to the official document for the specific value |
errors | If arg does not meet the form of timestamp, whether an exception will occur ignore: return original input without exception; raise: invalid resolution will throw exception (default); coerce: invalid resolution will be set to NaT |
Resolve a single string to a timedelta object:
>>> import pandas as pd >>> pd.to_timedelta('1 days 06:05:01.00003') Timedelta('1 days 06:05:01.000030') >>> >>> pd.to_timedelta('15.5us') Timedelta('0 days 00:00:00.000015')
To resolve a string list or array to a timedelta object:
>>> import pandas as pd >>> pd.to_timedelta(['1 days 06:05:01.00003', '15.5us', 'nan']) TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015', NaT], dtype='timedelta64[ns]', freq=None)
Specify the unit parameter:
>>> import pandas as pd >>> pd.to_timedelta(np.arange(5), unit='s') TimedeltaIndex(['00:00:00', '00:00:01', '00:00:02', '00:00:03', '00:00:04'], dtype='timedelta64[ns]', freq=None) >>> >>> pd.to_timedelta(np.arange(5), unit='d') TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
[04x03]timedelta_range
timedelta_ The range method generates a TimedeltaIndex of a specified length based on a specified frequency.
Basic syntax:
pandas.timedelta_range(start=None, end=None, periods=None, freq=None, name=None, closed=None) → pandas.core.indexes.timedeltas.TimedeltaIndex
Official documents: https://pandas.pydata.org/docs/reference/api/pandas.timedelta_range.html
Common parameters:
parameter | describe |
---|---|
start | Start date |
end | End date |
periods | int type, number of periods to generate |
freq | Frequency string, that is, generate date according to a specific frequency. For the value, see[02x02] value of freq frequency |
name | Name of the resulting TimedeltaIndex |
closed |
None: the default value, keeping both the start date and the end date 'left': keep start date, don't keep end date 'right': keep end date, do not keep start date |
Application example:
>>> import pandas as pd >>> pd.timedelta_range(start='1 day', periods=4) TimedeltaIndex(['1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq='D')
The closed parameter specifies which endpoint to keep. Two endpoints are retained by default:
>>> import pandas as pd >>> pd.timedelta_range(start='1 day', periods=4, closed='right') TimedeltaIndex(['2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq='D')
The freq parameter specifies the frequency of TimedeltaIndex. Only fixed frequency is accepted, and non fixed frequency such as' M 'will report an error:
>>> import pandas as pd >>> pd.timedelta_range(start='1 day', end='2 days', freq='6H') TimedeltaIndex(['1 days 00:00:00', '1 days 06:00:00', '1 days 12:00:00', '1 days 18:00:00', '2 days 00:00:00'], dtype='timedelta64[ns]', freq='6H') >>> >>> pd.timedelta_range(start='1 day', end='2 days', freq='M') Traceback (most recent call last): ... ValueError: <MonthEnd> is a non-fixed frequency
[05x00] resampling and frequency conversion
Resampling refers to the process of converting time series from one frequency to another. The aggregation of high frequency data to low frequency is called down sampling, while the conversion of low frequency data to high frequency is called up sampling. Not all resampling can be classified into these two categories. For example, converting W-WED to W-FRI is neither downsampling nor upsampling.
The example method is provided in Pandas to help us implement resampling. The Pandas object has a example method, which is the main function of various frequency conversion work.
Basic syntax:
Series.resample(self, rule, axis=0, closed: Union[str, NoneType] = None, label: Union[str, NoneType] = None, convention: str = 'start', kind: Union[str, NoneType] = None, loffset=None, base: int = 0, on=None, level=None)
DataFrame.resample(self, rule, axis=0, closed: Union[str, NoneType] = None, label: Union[str, NoneType] = None, convention: str = 'start', kind: Union[str, NoneType] = None, loffset=None, base: int = 0, on=None, level=None)
Common parameters:
parameter | describe |
---|---|
rule | |
axis | Resampled axis, default 0 |
closed | In resampling, which end of each time period is closed (i.e. included), Except that the default values of 'M', 'A', 'Q', 'BM', 'BA', 'BQ' and 'W' are 'right', other default values are 'left'‘ |
label | In resampling, how to set the label of aggregate value, right or left, which is None by default, For example, the five minutes between 9:30 and 9:35 would be marked as 9:30 or 9:35 |
convention | For PeriodIndex only, 'start' or 's',' end 'or' e ' |
on | For the DataFrame object, you can use this parameter to specify the index (row index) of the resampled data as a column in the original data |
level | For a DataFrame object with a multi index, you can use this parameter to specify at which level resampling is required |
Resample the sequence to a frequency of three minutes and add the values for each frequency:
>>> import pandas as pd >>> index = pd.date_range('1/1/2000', periods=9, freq='T') >>> series = pd.Series(range(9), index=index) >>> series 2000-01-01 00:00:00 0 2000-01-01 00:01:00 1 2000-01-01 00:02:00 2 2000-01-01 00:03:00 3 2000-01-01 00:04:00 4 2000-01-01 00:05:00 5 2000-01-01 00:06:00 6 2000-01-01 00:07:00 7 2000-01-01 00:08:00 8 Freq: T, dtype: int64 >>> >>> series.resample('3T').sum() 2000-01-01 00:00:00 3 2000-01-01 00:03:00 12 2000-01-01 00:06:00 21 Freq: 3T, dtype: int64
Set label='right ', that is, each index will use the label on the right (the larger value):
>>> import pandas as pd >>> index = pd.date_range('1/1/2000', periods=9, freq='T') >>> series = pd.Series(range(9), index=index) >>> series 2000-01-01 00:00:00 0 2000-01-01 00:01:00 1 2000-01-01 00:02:00 2 2000-01-01 00:03:00 3 2000-01-01 00:04:00 4 2000-01-01 00:05:00 5 2000-01-01 00:06:00 6 2000-01-01 00:07:00 7 2000-01-01 00:08:00 8 Freq: T, dtype: int64 >>> >>> series.resample('3T', label='right').sum() 2000-01-01 00:03:00 3 2000-01-01 00:06:00 12 2000-01-01 00:09:00 21 Freq: 3T, dtype: int64
Set closed='right ', that is, the result will contain the rightmost (larger) value in the original data:
>>> import pandas as pd >>> index = pd.date_range('1/1/2000', periods=9, freq='T') >>> series = pd.Series(range(9), index=index) >>> series 2000-01-01 00:00:00 0 2000-01-01 00:01:00 1 2000-01-01 00:02:00 2 2000-01-01 00:03:00 3 2000-01-01 00:04:00 4 2000-01-01 00:05:00 5 2000-01-01 00:06:00 6 2000-01-01 00:07:00 7 2000-01-01 00:08:00 8 Freq: T, dtype: int64 >>> >>> series.resample('3T', label='right', closed='right').sum() 2000-01-01 00:00:00 0 2000-01-01 00:03:00 6 2000-01-01 00:06:00 15 2000-01-01 00:09:00 15 Freq: 3T, dtype: int64
The following example resamples the sequence to a frequency of 30 seconds, and asfreq()[0:5] is used to select the first 5 rows of data:
>>> import pandas as pd >>> index = pd.date_range('1/1/2000', periods=9, freq='T') >>> series = pd.Series(range(9), index=index) >>> series 2000-01-01 00:00:00 0 2000-01-01 00:01:00 1 2000-01-01 00:02:00 2 2000-01-01 00:03:00 3 2000-01-01 00:04:00 4 2000-01-01 00:05:00 5 2000-01-01 00:06:00 6 2000-01-01 00:07:00 7 2000-01-01 00:08:00 8 Freq: T, dtype: int64 >>> >>> series.resample('30S').asfreq()[0:5] 2000-01-01 00:00:00 0.0 2000-01-01 00:00:30 NaN 2000-01-01 00:01:00 1.0 2000-01-01 00:01:30 NaN 2000-01-01 00:02:00 2.0 Freq: 30S, dtype: float64
Fill back the missing value (NaN) using the pad method:
>>> import pandas as pd >>> index = pd.date_range('1/1/2000', periods=9, freq='T') >>> series = pd.Series(range(9), index=index) >>> series 2000-01-01 00:00:00 0 2000-01-01 00:01:00 1 2000-01-01 00:02:00 2 2000-01-01 00:03:00 3 2000-01-01 00:04:00 4 2000-01-01 00:05:00 5 2000-01-01 00:06:00 6 2000-01-01 00:07:00 7 2000-01-01 00:08:00 8 Freq: T, dtype: int64 >>> >>> series.resample('30S').pad()[0:5] 2000-01-01 00:00:00 0 2000-01-01 00:00:30 0 2000-01-01 00:01:00 1 2000-01-01 00:01:30 1 2000-01-01 00:02:00 2 Freq: 30S, dtype: int64
Fill the missing value (NaN) forward using the bfill method:
>>> import pandas as pd >>> index = pd.date_range('1/1/2000', periods=9, freq='T') >>> series = pd.Series(range(9), index=index) >>> series 2000-01-01 00:00:00 0 2000-01-01 00:01:00 1 2000-01-01 00:02:00 2 2000-01-01 00:03:00 3 2000-01-01 00:04:00 4 2000-01-01 00:05:00 5 2000-01-01 00:06:00 6 2000-01-01 00:07:00 7 2000-01-01 00:08:00 8 Freq: T, dtype: int64 >>> >>> series.resample('30S').bfill()[0:5] 2000-01-01 00:00:00 0 2000-01-01 00:00:30 1 2000-01-01 00:01:00 1 2000-01-01 00:01:30 2 2000-01-01 00:02:00 2 Freq: 30S, dtype: int64
Pass the custom function through the apply method:
>>> import pandas as pd >>> index = pd.date_range('1/1/2000', periods=9, freq='T') >>> series = pd.Series(range(9), index=index) >>> series 2000-01-01 00:00:00 0 2000-01-01 00:01:00 1 2000-01-01 00:02:00 2 2000-01-01 00:03:00 3 2000-01-01 00:04:00 4 2000-01-01 00:05:00 5 2000-01-01 00:06:00 6 2000-01-01 00:07:00 7 2000-01-01 00:08:00 8 Freq: T, dtype: int64 >>> >>> def custom_resampler(array_like): return np.sum(array_like) + 5 >>> series.resample('3T').apply(custom_resampler) 2000-01-01 00:00:00 8 2000-01-01 00:03:00 17 2000-01-01 00:06:00 26 Freq: 3T, dtype: int64
Application of convention parameter:
>>> import pandas as pd >>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01', freq='A', periods=2)) >>> s 2012 1 2013 2 Freq: A-DEC, dtype: int64 >>> >>> s.resample('Q', convention='start').asfreq() 2012Q1 1.0 2012Q2 NaN 2012Q3 NaN 2012Q4 NaN 2013Q1 2.0 2013Q2 NaN 2013Q3 NaN 2013Q4 NaN Freq: Q-DEC, dtype: float64 >>> >>> s.resample('Q', convention='end').asfreq() 2012Q4 1.0 2013Q1 NaN 2013Q2 NaN 2013Q3 NaN 2013Q4 2.0 Freq: Q-DEC, dtype: float64
>>> import pandas as pd >>> q = pd.Series([1, 2, 3, 4], index=pd.period_range('2018-01-01', freq='Q', periods=4)) >>> q 2018Q1 1 2018Q2 2 2018Q3 3 2018Q4 4 Freq: Q-DEC, dtype: int64 >>> >>> q.resample('M', convention='end').asfreq() 2018-03 1.0 2018-04 NaN 2018-05 NaN 2018-06 2.0 2018-07 NaN 2018-08 NaN 2018-09 3.0 2018-10 NaN 2018-11 NaN 2018-12 4.0 Freq: M, dtype: float64 >>> >>> q.resample('M', convention='start').asfreq() 2018-01 1.0 2018-02 NaN 2018-03 NaN 2018-04 2.0 2018-05 NaN 2018-06 NaN 2018-07 3.0 2018-08 NaN 2018-09 NaN 2018-10 4.0 2018-11 NaN 2018-12 NaN Freq: M, dtype: float64
For DataFrame objects, you can use the keyword on to specify a column in the original data as the row index of the resampled data:
>>> import pandas as pd >>> d = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19], 'volume': [50, 60, 40, 100, 50, 100, 40, 50]}) >>> df = pd.DataFrame(d) >>> df['week_starting'] = pd.date_range('01/01/2018', periods=8, freq='W') >>> df price volume week_starting 0 10 50 2018-01-07 1 11 60 2018-01-14 2 9 40 2018-01-21 3 13 100 2018-01-28 4 14 50 2018-02-04 5 18 100 2018-02-11 6 17 40 2018-02-18 7 19 50 2018-02-25 >>> >>> df.resample('M', on='week_starting').mean() price volume week_starting 2018-01-31 10.75 62.5 2018-02-28 17.00 60.0
For a DataFrame object with a multi index, you can use the keyword level to specify at which level you want to resample:
>>> import pandas as pd >>> days = pd.date_range('1/1/2000', periods=4, freq='D') >>> d2 = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19], 'volume': [50, 60, 40, 100, 50, 100, 40, 50]}) >>> df2 = pd.DataFrame(d2, index=pd.MultiIndex.from_product([days, ['morning', 'afternoon']])) >>> df2 price volume 2000-01-01 morning 10 50 afternoon 11 60 2000-01-02 morning 9 40 afternoon 13 100 2000-01-03 morning 14 50 afternoon 18 100 2000-01-04 morning 17 40 afternoon 19 50 >>> >>> df2.resample('D', level=0).sum() price volume 2000-01-01 21 110 2000-01-02 22 140 2000-01-03 32 150 2000-01-04 36 90
This is an anti crawler text, please ignore. This paper was first published in CSDN by TRHX. Blog homepage: https://itrhx.blog.csdn.net/ Link to this article: https://itrhx.blog.csdn.net/article/details/106947061 Unauthorized, no reprint! Reprint maliciously at your own risk! Respect originality and keep away from plagiarism!