Pandas of Python data analysis: time series

Posted by thegman on Sat, 27 Jun 2020 06:52:12 +0200

CSDN Course recommendation: towards data scientist: take you to play Python data analysis, lecturer Qi Wei, CTO of Suzhou Yantu Education Technology Co., Ltd., member of the master's Steering Committee of applied statistics major of Suzhou University; published "learn python with the old: easy introduction", "learn python with the old: Django real battle", "learn python with the old: data analysis" and "Python University real life" Use tutorial "best seller.".

Pandas series (updating...) :

In addition, NumPy and Matplotlib series of articles have been updated, welcome to follow:

NumPy series: https://itrhx.blog.csdn.net/category_9780393.html
Matplotlib series: https://itrhx.blog.csdn.net/category_9780418.html

Recommended learning materials and websites (bloggers participate in translation of some documents):

NumPy official Chinese website: https://www.numpy.org.cn/
Pandas official Chinese website: https://www.pypandas.cn/
Matplotlib official Chinese website: https://www.matplotlib.org.cn/
Quick reference table of NumPy, Matplotlib and Pandas: https://github.com/TRHX/Python-quick-reference-table

[02x01]pandas.Timestamp
[02x02] value of freq frequency
[02x03]to_datetime
[02x04]date_range
[02x05] index and section
[02x06] mobile data and data offset
[02x07] time zone processing

[03x00] fixed period

[03x01]pandas.Period
[03x02]period_range
[03x03] asfreq period frequency conversion
[03x04]to_period and to_timestamp()

[04x00] timedelta interval

[04x01]pandas.Timedelta
[04x02]to_timedelta
[04x03]timedelta_range

[05x00] resampling and frequency conversion

This is an anti crawler text, please ignore.
This paper was first published in CSDN by TRHX.
Blog homepage: https://itrhx.blog.csdn.net/
Link to this article: https://itrhx.blog.csdn.net/article/details/106947061
 Unauthorized, no reprint! Reprint maliciously at your own risk! Respect originality and keep away from plagiarism!

[01x00] time series

Introduction to time series on the official website: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

Time series is an important form of structured data, which is applied in many fields, including finance, economics, ecology, neuroscience, physics, etc. Anything observed or measured at multiple time points can form a time series. Many time series are fixed frequency, that is to say, data points appear regularly according to certain rules (such as every 15 seconds, every 5 minutes, every month). Time series can also be irregular, with no fixed time unit or offset between units. The significance of time series data depends on specific application scenarios, mainly including the following:

timestamp, indicating a specific time point, such as 15:30 on June 24, 2020;
Fixed period, indicating a time period, such as 2020-01;
Time delta, duration, the difference between two dates or times.
For Timestamp data, panda provides the Timestamp type. It's essentially an alternative to Python's native datetime type, but with better performance numpy.datetime64 Type based creation. The corresponding index data structure is DatetimeIndex.
For time Period data, Pandas provides the Period type. This is the use of numpy.datetime64 Type encodes a fixed frequency time interval. The corresponding index data structure is PeriodIndex.
Pandas provides a timedelta type for time increments or durations. Timedelta is a native alternative to Python datetime.timedelta Type of high-performance data structure, also based on numpy.timedelta64 Type. The corresponding index data structure is timedelta index.

[02x00] Timestamp

[02x01]pandas.Timestamp

In pandas, pandas.Timestamp Method to replace the datetime.datetime method.

Timestamp is equivalent to Python's Datetime and can be interchanged in most cases. This type is used to make up DatetimeIndex and other time series oriented data structures in Pandas.

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html

Basic syntax:

class pandas.Timestamp(ts_input=<object object>, 
					   freq=None, tz=None, unit=None, 
					   year=None, month=None, day=None, 
					   hour=None, minute=None, second=None, 
					   microsecond=None, nanosecond=None, tzinfo=None)

Common parameters:

parameter	describe
ts_input	The object to be converted to a timestamp can be of type datetime like, str, int, float
freq	The offset that the time stamp will have can be str, and the date offset type. Please refer to[02x02] value of freq frequency
tz	Time zone the timestamp will have
unit	If ts_input is an integer or floating-point number, which is used to set its unit (D, s, ms, us, ns)

Simple example:

>>> import pandas as pd
>>> pd.Timestamp('2017-01-01T12')
Timestamp('2017-01-01 12:00:00')

Set unit = s, that is, the unit of the object to be converted is second:

>>> import pandas as pd
>>> pd.Timestamp(1513393355.5, unit='s')
Timestamp('2017-12-16 03:02:35.500000')

Use the tz parameter to set the time zone:

>>> import pandas as pd
>>> pd.Timestamp(1513393355, unit='s', tz='US/Pacific')
Timestamp('2017-12-15 19:02:35-0800', tz='US/Pacific')

Set date separately:

>>> import pandas as pd
>>> pd.Timestamp(year=2020, month=6, day=24, hour=12)
Timestamp('2020-06-24 12:00:00')

[02x02] value of freq frequency

For complete values, please refer to official documents: https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases

parameter	type	describe
D	Day	Daily calendar day
B	BusinessDay	Every working day
H	Hour	Hourly
T or min	Minute	Per cent
S	Second	Per second
L or ms	Milli	Every millisecond (i.e. every thousandth of a second)
U	Micro	Per microsecond (i.e. per millionth of a second)
M	MonthEnd	Last calendar day of the month
BM	BusinessMonthEnd	Last working day of the month
MS	MonthBegin	First calendar day of each month
BMS	BusinessMonthBegin	First working day of each month
W-MON,W-TUE...	Week	From the specified day of week (MON, TUE, WED, THU, FR, SAT, SUN), every week
WoM-1MON,WOM-2MON...	WeekOfMonth	The day of the week that produces the first, second, third, or fourth week of the month. For example, WoM-3FRI represents the third Friday of each month
Q-JAN,Q-FEB...	QuarterEnd	For the year ending in the specified month (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC), the last calendar day of the last month of each quarter
BQ-JAN,BQ-FEB...	BusinessQuarterEnd	For a year ending in a specified month, the last working day of the last month of each quarter
QS-JAN,QS-FEB...	QuarterBegin	For a year ending in a specified month, the first calendar day of the last month of each quarter
BQS-JAN, BQS-FEB...	BusinessQuarterBegin	For a year ending in a specified month, the first working day of the last month of each quarter
A-JAN,A-FEB...	YearEnd	The last calendar day of the specified month (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC) every year
BA-JAN,BA-FEB...	BusinessYearEnd	Last working day of the specified month of each year
AS-JAN,AS-FEB...	YearBegin	The first calendar day of the specified month of each year
BAS-JAN,BAS-FEB...	BusinessYearBegin	The first working day of the specified month of each year

[02x03]to_datetime

In Python, the datetime library provides date and time processing methods. Using str or strftime methods, you can convert the datetime object into a string. For specific usage, see [Python standard library learning] date and time processing library - datetime.

>>> from datetime import datetime
>>> stamp = datetime(2020, 6, 24)
>>> stamp
datetime.datetime(2020, 6, 24, 0, 0)
>>>
>>> str(stamp)
'2020-06-24 00:00:00'
>>> 
>>> stamp.strftime('%Y-%m-%d')
'2020-06-24'

In panda to_ The datetime method can parse a string into a variety of different Timestamp objects:

>>> import pandas as pd
>>> datestrs = '2011-07-06 12:00:00'
>>> type(datestrs)
<class 'str'>
>>> 
>>> pd.to_datetime(datestrs)
Timestamp('2011-07-06 12:00:00')

Basic syntax:

pandas.to_datetime(arg, errors='raise', dayfirst=False, 
				   yearfirst=False, utc=None, format=None, 
				   exact=True, unit=None, infer_datetime_format=False, 
				   origin='unix', cache=True)

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html

Common parameters:

parameter	describe
arg	To convert objects to date time, you can accept int, float, STR, datetime, list, tuple, 1-D array, series dataframe / dict like types
errors	If the string does not meet the form of time stamp, whether an exception will occur ignore: return original input without exception; raise: invalid resolution will throw exception (default); coerce: invalid resolution will be set to NaT
dayfirst	bool type, default False, if arg is str or list, whether to resolve to date first For example, dayfirst is True, 10 / 11 / 12 is interpreted as 2012-11-10, and False is interpreted as 2012-10-11
yearfirst	bool type, default False, if arg is str or list, whether to resolve to year first For example, dayfirst is True, 10 / 11 / 12 is resolved to 2010-11-12, and False is resolved to 2012-10-11 If dayfirst and yearfirst are both True, then yearfirst takes precedence
utc	bool type. When it is converted to coordination world or not, the default is None
format	Format time, such as 21 / 2 / 20 16:10 using% D /% m /% Y% H:% m will be resolved to 2020-02-21 16:10:00 Common articles on symbolic meaning: [Python standard library learning] date and time processing library - datetime perhaps Official documents
exact	If True, an exact format match is required. If False, the format is allowed to match anywhere in the target string
unit	If arg is an integer or floating-point number, this parameter is used to set its unit (D, s, ms, us, ns)

Simple application:

>>> import pandas as pd
>>> obj = pd.DataFrame({'year': [2015, 2016], 'month': [2, 3], 'day': [4, 5]})
>>> obj
   year  month  day
0  2015      2    4
1  2016      3    5
>>> 
>>> pd.to_datetime(obj)
0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

Set the format and errors parameters:

>>> import pandas as pd
>>> pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')
datetime.datetime(1300, 1, 1, 0, 0)
>>> 
>>> pd.to_datetime('13000101', format='%Y%m%d', errors='coerce')
NaT
>>> 
>>> pd.to_datetime('13000101', format='%Y%m%d', errors='raise')
Traceback (most recent call last):
...
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1300-01-01 00:00:00

Set the unit parameter:

>>> import pandas as pd
>>> pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
>>> 
>>> pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')

[02x04]date_range

pandas.date_ The range method can be used to generate a DatetimeIndex of a specified length based on a specified frequency.

Basic syntax:

pandas.date_range(start=None, end=None, periods=None, freq=None, 
				  tz=None, normalize=False, name=None, closed=None, 
				  **kwargs) → pandas.core.indexes.datetimes.DatetimeIndex

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.date_range.html

parameter	describe
start	Start date
end	End date
periods	int type, number of periods to generate (days)
freq	Frequency string, that is, generate date according to a specific frequency. For the value, see[02x02] value of freq frequency
tz	Set the time zone, for example, "Asia/Hong_Kong”
normalize	bool type, default False, whether to normalize it before the generation date (only keep the date)
name	Name of the result DatetimeIndex
closed	None: the default value, keeping both the start date and the end date 'left': keep start date, don't keep end date 'right': keep end date, do not keep start date

Simple example:

>>> import pandas as pd
>>> pd.date_range(start='1/1/2018', end='1/08/2018')
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
              dtype='datetime64[ns]', freq='D')

Specify the periods parameter:

>>> import pandas as pd
>>> pd.date_range(start='2012-04-01', periods=20)
DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],
              dtype='datetime64[ns]', freq='D')
>>> 
>>> pd.date_range(end='2012-06-01', periods=20)
DatetimeIndex(['2012-05-13', '2012-05-14', '2012-05-15', '2012-05-16',
               '2012-05-17', '2012-05-18', '2012-05-19', '2012-05-20',
               '2012-05-21', '2012-05-22', '2012-05-23', '2012-05-24',
               '2012-05-25', '2012-05-26', '2012-05-27', '2012-05-28',
               '2012-05-29', '2012-05-30', '2012-05-31', '2012-06-01'],
              dtype='datetime64[ns]', freq='D')
>>>
>>> pd.date_range(start='2018-04-24', end='2018-04-27', periods=3)
DatetimeIndex(['2018-04-24 00:00:00', '2018-04-25 12:00:00', '2018-04-27 00:00:00'],
              dtype='datetime64[ns]', freq=None)
>>>
>>> pd.date_range(start='2018-04-24', end='2018-04-28', periods=3)
DatetimeIndex(['2018-04-24', '2018-04-26', '2018-04-28'], dtype='datetime64[ns]', freq=None)

Specify that freq='M 'will generate the date according to the frequency of the last calendar day of each month, and specify that freq='3M' will generate the date according to the frequency of the last calendar day of each month every three months:

>>> import pandas as pd
>>> pd.date_range(start='1/1/2018', periods=5, freq='M')
DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
               '2018-05-31'],
              dtype='datetime64[ns]', freq='M')
>>> 
>>> pd.date_range(start='1/1/2018', periods=5, freq='3M')
DatetimeIndex(['2018-01-31', '2018-04-30', '2018-07-31', '2018-10-31',
               '2019-01-31'],
              dtype='datetime64[ns]', freq='3M')
>>>

Use the tz parameter to set the time zone:

>>> import pandas as pd
>>> pd.date_range(start='1/1/2018', periods=5, tz='Asia/Tokyo')
DatetimeIndex(['2018-01-01 00:00:00+09:00', '2018-01-02 00:00:00+09:00',
               '2018-01-03 00:00:00+09:00', '2018-01-04 00:00:00+09:00',
               '2018-01-05 00:00:00+09:00'],
              dtype='datetime64[ns, Asia/Tokyo]', freq='D')
>>> 
>>> pd.date_range(start='6/24/2020', periods=5, tz='Asia/Hong_Kong')
DatetimeIndex(['2020-06-24 00:00:00+08:00', '2020-06-25 00:00:00+08:00',
               '2020-06-26 00:00:00+08:00', '2020-06-27 00:00:00+08:00',
               '2020-06-28 00:00:00+08:00'],
              dtype='datetime64[ns, Asia/Hong_Kong]', freq='D')

Set the normalize parameter to format the timestamps before they are generated:

>>> import pandas as pd
>>> pd.date_range('2020-06-24 12:56:31', periods=5, normalize=True)
DatetimeIndex(['2020-06-24', '2020-06-25', '2020-06-26', '2020-06-27',
               '2020-06-28'],
              dtype='datetime64[ns]', freq='D')

Set the closed parameter:

>>> import pandas as pd
>>> pd.date_range(start='2020-06-20', end='2020-06-24', closed=None)
DatetimeIndex(['2020-06-20', '2020-06-21', '2020-06-22', '2020-06-23',
               '2020-06-24'],
              dtype='datetime64[ns]', freq='D')
>>> 
>>> pd.date_range(start='2020-06-20', end='2020-06-24', closed='left')
DatetimeIndex(['2020-06-20', '2020-06-21', '2020-06-22', '2020-06-23'], dtype='datetime64[ns]', freq='D')
>>> 
>>> pd.date_range(start='2020-06-20', end='2020-06-24', closed='right')
DatetimeIndex(['2020-06-21', '2020-06-22', '2020-06-23', '2020-06-24'], dtype='datetime64[ns]', freq='D')

[02x05] index and section

The most basic time series type of Pandas is a series indexed by a time stamp (usually represented by a Python string or datatime object). These datetime objects are actually placed in the DatetimeIndex. You can use similar pandas.Series Object is indexed by its slice method:

>>> import pandas as pd
>>> import numpy as np
>>> dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
	         datetime(2011, 1, 7), datetime(2011, 1, 8),
	         datetime(2011, 1, 10), datetime(2011, 1, 12)]
>>> obj = pd.Series(np.random.randn(6), index=dates)
>>> 
>>> obj
2011-01-02   -0.407110
2011-01-05   -0.186661
2011-01-07   -0.731080
2011-01-08    0.860970
2011-01-10    1.929973
2011-01-12   -0.168599
dtype: float64
>>> 
>>> obj.index
DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)
>>>
>>> obj.index[0]
Timestamp('2011-01-02 00:00:00')
>>> 
>>> obj.index[0:3]
DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07'], dtype='datetime64[ns]', freq=None)

In addition, you can pass in a string that can be interpreted as a date, or simply pass in "year" or "month" to easily select slices of data:

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
>>> obj
2000-01-01   -1.142284
2000-01-02    1.198785
2000-01-03    2.466909
2000-01-04   -0.086728
2000-01-05   -0.978437
                ...   
2002-09-22   -0.252240
2002-09-23    0.148561
2002-09-24   -1.330409
2002-09-25   -0.673471
2002-09-26   -0.253271
Freq: D, Length: 1000, dtype: float64
>>> 
>>> obj['26/9/2002']
-0.25327100684233356
>>> 
>>> obj['2002']
2002-01-01    1.058715
2002-01-02    0.900859
2002-01-03    1.993508
2002-01-04   -0.103211
2002-01-05   -0.950090
                ...   
2002-09-22   -0.252240
2002-09-23    0.148561
2002-09-24   -1.330409
2002-09-25   -0.673471
2002-09-26   -0.253271
Freq: D, Length: 269, dtype: float64
>>> 
>>> obj['2002-09']
2002-09-01   -0.995528
2002-09-02    0.501528
2002-09-03   -0.486753
2002-09-04   -1.083906
2002-09-05    1.458975
2002-09-06   -1.331685
2002-09-07    0.195338
2002-09-08   -0.429613
2002-09-09    1.125823
2002-09-10    1.607051
2002-09-11    0.530387
2002-09-12   -0.015938
2002-09-13    1.781043
2002-09-14   -0.277123
2002-09-15    0.344569
2002-09-16   -1.010810
2002-09-17    0.463001
2002-09-18    1.883636
2002-09-19    0.274520
2002-09-20    0.624184
2002-09-21   -1.203057
2002-09-22   -0.252240
2002-09-23    0.148561
2002-09-24   -1.330409
2002-09-25   -0.673471
2002-09-26   -0.253271
Freq: D, dtype: float64
>>> 
>>> obj['20/9/2002':'26/9/2002']
2002-09-20    0.624184
2002-09-21   -1.203057
2002-09-22   -0.252240
2002-09-23    0.148561
2002-09-24   -1.330409
2002-09-25   -0.673471
2002-09-26   -0.253271
Freq: D, dtype: float64

[02x06] mobile data and data offset

shifting refers to moving data forward or backward along the timeline. Both Series and DataFrame have a shift method to perform simple forward or backward operations, keeping the index unchanged:

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.Series(np.random.randn(4),
		    index=pd.date_range('1/1/2000', periods=4, freq='M'))
>>> obj
2000-01-31   -0.100217
2000-02-29    1.177834
2000-03-31   -0.644353
2000-04-30   -1.954679
Freq: M, dtype: float64
>>> 
>>> obj.shift(2)
2000-01-31         NaN
2000-02-29         NaN
2000-03-31   -0.100217
2000-04-30    1.177834
Freq: M, dtype: float64
>>> 
>>> obj.shift(-2)
2000-01-31   -0.644353
2000-02-29   -1.954679
2000-03-31         NaN
2000-04-30         NaN
Freq: M, dtype: float64

Because a simple shift operation does not modify the index, part of the data is discarded and NaN (missing value) is introduced. Therefore, if the frequency is known, it can be passed to shift so as to realize the displacement of time stamp instead of simple displacement of data:

>>> import pandas as pd
>>> import numpy as np
>>> obj = pd.Series(np.random.randn(4),
		    index=pd.date_range('1/1/2000', periods=4, freq='M'))
>>> obj
2000-01-31   -0.100217
2000-02-29    1.177834
2000-03-31   -0.644353
2000-04-30   -1.954679
Freq: M, dtype: float64
>>> 
>>> obj.shift(2, freq='M')
2000-03-31   -0.100217
2000-04-30    1.177834
2000-05-31   -0.644353
2000-06-30   -1.954679
Freq: M, dtype: float64

The frequency in Pandas consists of a base frequency and a multiplier. The base frequency is usually represented by a string alias, such as "M" for each month and "H" for each Hour. For each base frequency, there is an object called date offset. For example, the frequency calculated by Hour can be represented by the Hour class:

>>> from pandas.tseries.offsets import Hour, Minute
>>> hour = Hour()
>>> hour
<Hour>
>>> 
>>> four_hours = Hour(4)
>>> four_hours
<4 * Hours>

In general, you don't need to explicitly create such an object, just use a string alias such as "H" or "4H". Put an integer in front of the basic frequency to create a multiple:

>>> import pandas as pd
>>> pd.date_range('2000-01-01', '2000-01-03 23:59', freq='4h')
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00',
               '2000-01-01 08:00:00', '2000-01-01 12:00:00',
               '2000-01-01 16:00:00', '2000-01-01 20:00:00',
               '2000-01-02 00:00:00', '2000-01-02 04:00:00',
               '2000-01-02 08:00:00', '2000-01-02 12:00:00',
               '2000-01-02 16:00:00', '2000-01-02 20:00:00',
               '2000-01-03 00:00:00', '2000-01-03 04:00:00',
               '2000-01-03 08:00:00', '2000-01-03 12:00:00',
               '2000-01-03 16:00:00', '2000-01-03 20:00:00'],
              dtype='datetime64[ns]', freq='4H')

Most offset objects can be connected by addition:

>>> from pandas.tseries.offsets import Hour, Minute
>>> Hour(2) + Minute(30)
<150 * Minutes>

For the freq parameter, you can also pass in a frequency string (such as "2h30min"), which can be efficiently parsed into an equivalent expression:

>>> import pandas as pd
>>> pd.date_range('2000-01-01', periods=10, freq='1h30min')
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:30:00',
               '2000-01-01 03:00:00', '2000-01-01 04:30:00',
               '2000-01-01 06:00:00', '2000-01-01 07:30:00',
               '2000-01-01 09:00:00', '2000-01-01 10:30:00',
               '2000-01-01 12:00:00', '2000-01-01 13:30:00'],
              dtype='datetime64[ns]', freq='90T')

This offset can also be used on datetime or Timestamp objects:

>>> from pandas.tseries.offsets import Day, MonthEnd
>>> now = datetime(2011, 11, 17)
>>> now + 3 * Day()
Timestamp('2011-11-20 00:00:00')

If you add an anchor offset, such as MonthEnd, the first increment will scroll the original date forward to the next date that meets the frequency rule:

>>> from pandas.tseries.offsets import Day, MonthEnd
>>> now = datetime(2011, 11, 17)
>>> now + MonthEnd()
Timestamp('2011-11-30 00:00:00')
>>> now + MonthEnd(2)
Timestamp('2011-12-31 00:00:00')

Through the rollforward and rollback methods of anchor offset, the date can be scrolled forward or backward explicitly:

>>> from pandas.tseries.offsets import Day, MonthEnd
>>> now = datetime(2011, 11, 17)
>>> offset = MonthEnd()
>>> offset.rollforward(now)
Timestamp('2011-11-30 00:00:00')
>>> offset.rollback(now)
Timestamp('2011-10-31 00:00:00')

In combination with the groupby method:

>>> import pandas as pd
>>> import numpy as np
>>> from pandas.tseries.offsets import Day, MonthEnd
>>> obj = pd.Series(np.random.randn(20),
		    index=pd.date_range('1/15/2000', periods=20, freq='4d'))
>>> obj
2000-01-15   -0.591729
2000-01-19   -0.775844
2000-01-23   -0.745603
2000-01-27   -0.076439
2000-01-31    1.796417
2000-02-04   -0.500349
2000-02-08    0.515851
2000-02-12   -0.344171
2000-02-16    0.419657
2000-02-20    0.307288
2000-02-24    0.115113
2000-02-28   -0.362585
2000-03-03    1.074892
2000-03-07    1.111366
2000-03-11    0.949910
2000-03-15   -1.535727
2000-03-19    0.545944
2000-03-23   -0.810139
2000-03-27   -1.260627
2000-03-31   -0.128403
Freq: 4D, dtype: float64
>>>
>>> offset = MonthEnd()
>>> obj.groupby(offset.rollforward).mean()
2000-01-31   -0.078640
2000-02-29    0.021543
2000-03-31   -0.006598
dtype: float64

[02x07] time zone processing

In Python, the time zone information comes from the third-party library pytz, using the pytz.common_ The timezones method can view all time zone names using the pytz.timezone Method to get the time zone object from pytz:

>>> import pytz
>>> pytz.common_timezones
['Africa/Abidjan', 'Africa/Accra', 'Africa/Addis_Ababa', ..., 'UTC']
>>>
>>> tz = pytz.timezone('Asia/Shanghai')
>>> tz
<DstTzInfo 'Asia/Shanghai' LMT+8:06:00 STD>  # It means that the time difference between UTC and UTC is 8 hours and 6 minutes

On date_ In the range method, the tz parameter is used to specify the time zone. The default value is None. You can use tz_ The localize method converts it to a localized time zone. In the following example, the non time zone is converted to a localized UTC time zone:

>>> import pandas as pd
>>> import numpy as np
>>> rng = pd.date_range('3/9/2012 9:30', periods=6, freq='D')
>>> ts = pd.Series(np.random.randn(len(rng)), index=rng)
>>> ts
2012-03-09 09:30:00   -1.527913
2012-03-10 09:30:00   -1.116101
2012-03-11 09:30:00    0.359358
2012-03-12 09:30:00   -0.475920
2012-03-13 09:30:00   -0.336570
2012-03-14 09:30:00   -1.075952
Freq: D, dtype: float64
>>> 
>>> print(ts.index.tz)
None
>>> 
>>> ts_utc = ts.tz_localize('UTC')
>>> ts_utc
2012-03-09 09:30:00+00:00   -1.527913
2012-03-10 09:30:00+00:00   -1.116101
2012-03-11 09:30:00+00:00    0.359358
2012-03-12 09:30:00+00:00   -0.475920
2012-03-13 09:30:00+00:00   -0.336570
2012-03-14 09:30:00+00:00   -1.075952
Freq: D, dtype: float64
>>>
>>> ts_utc.index
DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
               '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

After time series are localized to a specific time zone, TZ can be used_ The convert method converts it to a different time zone:

>>> import pandas as pd
>>> import numpy as np
>>> rng = pd.date_range('3/9/2012 9:30', periods=6, freq='D')
>>> ts = pd.Series(np.random.randn(len(rng)), index=rng)
>>> ts
2012-03-09 09:30:00    0.480303
2012-03-10 09:30:00   -1.461039
2012-03-11 09:30:00   -1.512749
2012-03-12 09:30:00   -2.185421
2012-03-13 09:30:00    1.657845
2012-03-14 09:30:00    0.175633
Freq: D, dtype: float64
>>>
>>> ts.tz_localize('UTC').tz_convert('Asia/Shanghai')
2012-03-09 17:30:00+08:00    0.480303
2012-03-10 17:30:00+08:00   -1.461039
2012-03-11 17:30:00+08:00   -1.512749
2012-03-12 17:30:00+08:00   -2.185421
2012-03-13 17:30:00+08:00    1.657845
2012-03-14 17:30:00+08:00    0.175633
Freq: D, dtype: float64

This is an anti crawler text, please ignore.
This paper was first published in CSDN by TRHX.
Blog homepage: https://itrhx.blog.csdn.net/
Link to this article: https://itrhx.blog.csdn.net/article/details/106947061
 Unauthorized, no reprint! Reprint maliciously at your own risk! Respect originality and keep away from plagiarism!

[03x00] fixed period

[03x01]pandas.Period

Fixed Period refers to time interval, such as days, months, quarters, years, etc. The Period class represents this data type, and its constructor needs to use a string or integer.

Basic syntax:

class pandas.Period(value=None, freq=None, ordinal=None, 
					year=None, month=None, quarter=None, 
					day=None, hour=None, minute=None, second=None)

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Period.html

Common parameters:

parameter	describe
value	time slot
freq	The offset that the time stamp will have can be str, and the date offset type. Please refer to[02x02] value of freq frequency

In the following example, the Period object represents the entire Period from January 1, 2020 to December 31, 2020

>>> import pandas as pd
>>> pd.Period(2020, freq='A-DEC')
Period('2020', 'A-DEC')

The displacement is carried out according to the frequency by the addition and subtraction method

>>> import pandas as pd
>>> obj = pd.Period(2020, freq='A-DEC')
>>> obj
Period('2020', 'A-DEC')
>>> 
>>> obj + 5
Period('2025', 'A-DEC')
>>> 
>>> obj - 5
Period('2015', 'A-DEC')

The PeriodIndex class holds a set of periods that can be used as axis indexes in any panda data structure:

>>> import pandas as pd
>>> import numpy as np
>>> rng = [pd.Period('2000-01'), pd.Period('2000-02'), pd.Period('2000-03'), 
		   pd.Period('2000-04'), pd.Period('2000-05'), pd.Period('2000-06')]
>>> obj = pd.Series(np.random.randn(6), index=rng)
>>> obj
2000-01    0.229092
2000-02    1.515498
2000-03   -0.334401
2000-04   -0.492681
2000-05   -2.012818
2000-06    0.338804
Freq: M, dtype: float64
>>> 
>>> obj.index
PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]', freq='M')

>>> import pandas as pd
>>> values = ['2001Q3', '2002Q2', '2003Q1']
>>> index = pd.PeriodIndex(values, freq='Q-DEC')
>>> index
PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='period[Q-DEC]', freq='Q-DEC')
>>>

[03x02]period_range

pandas.period_ The range method generates a PeriodIndex of a specified length based on a specified frequency.

Basic syntax:

pandas.period_range(start=None, end=None, periods=None, freq=None, name=None) → pandas.core.indexes.period.PeriodIndex

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.period_range.html

Common parameters:

parameter	describe
start	Start date
end	End date
periods	Number of periods to generate
freq	The offset that the time stamp will have can be str, and the date offset type. Please refer to[02x02] value of freq frequency
name	Result PeriodIndex object name

Simple application:

>>> import pandas as pd
>>> pd.period_range(start='2019-01-01', end='2020-01-01', freq='M')
PeriodIndex(['2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06',
             '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2019-12',
             '2020-01'],
            dtype='period[M]', freq='M')
>>>
>>> pd.period_range(start=pd.Period('2017Q1', freq='Q'),
                	end=pd.Period('2017Q2', freq='Q'), freq='M')
PeriodIndex(['2017-03', '2017-04', '2017-05', '2017-06'], dtype='period[M]', freq='M')

[03x03] asfreq period frequency conversion

Both Period and PeriodIndex objects can be converted to different frequencies by the asfreq method.

Basic syntax: PeriodIndex.asfreq(self, *args, **kwargs)

Common parameters:

parameter	describe
freq	For new frequency (offset), see[02x02] value of freq frequency
how	Align by start or end, 'E' or 'END' or 'FINISH'; 's' or' start 'or' begin '

Application example:

>>> import pandas as pd
>>> pidx = pd.period_range('2010-01-01', '2015-01-01', freq='A')
>>> pidx
PeriodIndex(['2010', '2011', '2012', '2013', '2014', '2015'], dtype='period[A-DEC]', freq='A-DEC')
>>> 
>>> pidx.asfreq('M')
PeriodIndex(['2010-12', '2011-12', '2012-12', '2013-12', '2014-12', '2015-12'], dtype='period[M]', freq='M')
>>> 
>>> pidx.asfreq('M', how='S')
PeriodIndex(['2010-01', '2011-01', '2012-01', '2013-01', '2014-01', '2015-01'], dtype='period[M]', freq='M')

[03x04]to_period and to_timestamp()

To_ The Period method can convert a Timestamp to a Period;

To_ The Timestamp method converts a Period to a Timestamp.

>>> import pandas as pd
>>> rng = pd.date_range('2000-01-01', periods=3, freq='M')
>>> ts = pd.Series(np.random.randn(3), index=rng)
>>> ts
2000-01-31    0.220759
2000-02-29   -0.108221
2000-03-31    0.819433
Freq: M, dtype: float64
>>> 
>>> pts = ts.to_period()
>>> pts
2000-01    0.220759
2000-02   -0.108221
2000-03    0.819433
Freq: M, dtype: float64
>>> 
>>> pts2 = pts.to_timestamp()
>>> pts2
2000-01-01    0.220759
2000-02-01   -0.108221
2000-03-01    0.819433
Freq: MS, dtype: float64
>>> 
>>> ts.index
DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31'], dtype='datetime64[ns]', freq='M')
>>> 
>>> pts.index
PeriodIndex(['2000-01', '2000-02', '2000-03'], dtype='period[M]', freq='M')
>>> 
>>> pts2.index
DatetimeIndex(['2000-01-01', '2000-02-01', '2000-03-01'], dtype='datetime64[ns]', freq='MS')

[04x00] timedelta interval

[04x01]pandas.Timedelta

Timedelta is the duration, the difference between two dates or times.

Timedelta is equivalent to Python's datetime.timedelta In most cases, the two can be interchanged.

Basic syntax: Class pandas.Timedelta (value=<object object>, unit=None, **kwargs)

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html

Common parameters:

parameter	describe
value	The value passed in can be timedelta, timedelta, np.timedelta64 , string or integer object
unit	The unit used to set the value. Please refer to the official document for the specific value

Represents the time difference between two datetime objects:

>>> import pandas as pd
>>> pd.to_datetime('2020-6-24') - pd.to_datetime('2016-1-1')
Timedelta('1636 days 00:00:00')

Pass parameters through string:

>>> import pandas as pd
>>> pd.Timedelta('3 days 3 hours 3 minutes 30 seconds')
Timedelta('3 days 03:03:30')

Pass parameters by integer:

>>> import pandas as pd
>>> pd.Timedelta(5,unit='h')
Timedelta('0 days 05:00:00')

Get properties:

>>> import pandas as pd
>>> obj = pd.Timedelta('3 days 3 hours 3 minutes 30 seconds')
>>> obj
Timedelta('3 days 03:03:30')
>>> 
>>> obj.days
3
>>> obj.seconds
11010

[04x02]to_timedelta

To_ The timedelta method converts the incoming object to a timedelta object.

Basic syntax: pandas.to_timedelta(arg, unit='ns', errors='raise')

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.to_timedelta.html

Common parameters:

parameter	describe
arg	Objects to be converted to timedelta can be str, timedelta, list like, or Series objects
unit	It is used to set the unit of arg. Please refer to the official document for the specific value
errors	If arg does not meet the form of timestamp, whether an exception will occur ignore: return original input without exception; raise: invalid resolution will throw exception (default); coerce: invalid resolution will be set to NaT

Resolve a single string to a timedelta object:

>>> import pandas as pd
>>> pd.to_timedelta('1 days 06:05:01.00003')
Timedelta('1 days 06:05:01.000030')
>>>
>>> pd.to_timedelta('15.5us')
Timedelta('0 days 00:00:00.000015')

To resolve a string list or array to a timedelta object:

>>> import pandas as pd
>>> pd.to_timedelta(['1 days 06:05:01.00003', '15.5us', 'nan'])
TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015', NaT], dtype='timedelta64[ns]', freq=None)

Specify the unit parameter:

>>> import pandas as pd
>>> pd.to_timedelta(np.arange(5), unit='s')
TimedeltaIndex(['00:00:00', '00:00:01', '00:00:02', '00:00:03', '00:00:04'], dtype='timedelta64[ns]', freq=None)
>>> 
>>> pd.to_timedelta(np.arange(5), unit='d')
TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)

[04x03]timedelta_range

timedelta_ The range method generates a TimedeltaIndex of a specified length based on a specified frequency.

Basic syntax:

pandas.timedelta_range(start=None, end=None, periods=None,
					   freq=None, name=None, closed=None) → pandas.core.indexes.timedeltas.TimedeltaIndex

Official documents: https://pandas.pydata.org/docs/reference/api/pandas.timedelta_range.html

Common parameters:

parameter	describe
start	Start date
end	End date
periods	int type, number of periods to generate
freq	Frequency string, that is, generate date according to a specific frequency. For the value, see[02x02] value of freq frequency
name	Name of the resulting TimedeltaIndex
closed	None: the default value, keeping both the start date and the end date 'left': keep start date, don't keep end date 'right': keep end date, do not keep start date

Application example:

>>> import pandas as pd
>>> pd.timedelta_range(start='1 day', periods=4)
TimedeltaIndex(['1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq='D')

The closed parameter specifies which endpoint to keep. Two endpoints are retained by default:

>>> import pandas as pd
>>> pd.timedelta_range(start='1 day', periods=4, closed='right')
TimedeltaIndex(['2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq='D')

The freq parameter specifies the frequency of TimedeltaIndex. Only fixed frequency is accepted, and non fixed frequency such as' M 'will report an error:

>>> import pandas as pd
>>> pd.timedelta_range(start='1 day', end='2 days', freq='6H')
TimedeltaIndex(['1 days 00:00:00', '1 days 06:00:00', '1 days 12:00:00',
                '1 days 18:00:00', '2 days 00:00:00'],
               dtype='timedelta64[ns]', freq='6H')
>>> 
>>> pd.timedelta_range(start='1 day', end='2 days', freq='M')
Traceback (most recent call last):
...
ValueError: <MonthEnd> is a non-fixed frequency

[05x00] resampling and frequency conversion

Resampling refers to the process of converting time series from one frequency to another. The aggregation of high frequency data to low frequency is called down sampling, while the conversion of low frequency data to high frequency is called up sampling. Not all resampling can be classified into these two categories. For example, converting W-WED to W-FRI is neither downsampling nor upsampling.

The example method is provided in Pandas to help us implement resampling. The Pandas object has a example method, which is the main function of various frequency conversion work.

Basic syntax:

Series.resample(self, rule, axis=0, 
				closed: Union[str, NoneType] = None, 
				label: Union[str, NoneType] = None, 
				convention: str = 'start', 
				kind: Union[str, NoneType] = None, 
				loffset=None, base: int = 0, 
				on=None, level=None)

DataFrame.resample(self, rule, axis=0, 
				   closed: Union[str, NoneType] = None, 
				   label: Union[str, NoneType] = None, 
				   convention: str = 'start', 
				   kind: Union[str, NoneType] = None, 
				   loffset=None, base: int = 0, 
				   on=None, level=None)

Common parameters:

parameter	describe
rule
axis	Resampled axis, default 0
closed	In resampling, which end of each time period is closed (i.e. included), Except that the default values of 'M', 'A', 'Q', 'BM', 'BA', 'BQ' and 'W' are 'right', other default values are 'left'‘
label	In resampling, how to set the label of aggregate value, right or left, which is None by default, For example, the five minutes between 9:30 and 9:35 would be marked as 9:30 or 9:35
convention	For PeriodIndex only, 'start' or 's',' end 'or' e '
on	For the DataFrame object, you can use this parameter to specify the index (row index) of the resampled data as a column in the original data
level	For a DataFrame object with a multi index, you can use this parameter to specify at which level resampling is required

Resample the sequence to a frequency of three minutes and add the values for each frequency:

>>> import pandas as pd
>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64
>>> 
>>> series.resample('3T').sum()
2000-01-01 00:00:00     3
2000-01-01 00:03:00    12
2000-01-01 00:06:00    21
Freq: 3T, dtype: int64

Set label='right ', that is, each index will use the label on the right (the larger value):

>>> import pandas as pd
>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64
>>> 
>>> series.resample('3T', label='right').sum()
2000-01-01 00:03:00     3
2000-01-01 00:06:00    12
2000-01-01 00:09:00    21
Freq: 3T, dtype: int64

Set closed='right ', that is, the result will contain the rightmost (larger) value in the original data:

>>> import pandas as pd
>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64
>>> 
>>> series.resample('3T', label='right', closed='right').sum()
2000-01-01 00:00:00     0
2000-01-01 00:03:00     6
2000-01-01 00:06:00    15
2000-01-01 00:09:00    15
Freq: 3T, dtype: int64

The following example resamples the sequence to a frequency of 30 seconds, and asfreq()[0:5] is used to select the first 5 rows of data:

>>> import pandas as pd
>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64
>>> 
>>> series.resample('30S').asfreq()[0:5]
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    NaN
2000-01-01 00:01:00    1.0
2000-01-01 00:01:30    NaN
2000-01-01 00:02:00    2.0
Freq: 30S, dtype: float64

Fill back the missing value (NaN) using the pad method:

>>> import pandas as pd
>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64
>>> 
>>> series.resample('30S').pad()[0:5]
2000-01-01 00:00:00    0
2000-01-01 00:00:30    0
2000-01-01 00:01:00    1
2000-01-01 00:01:30    1
2000-01-01 00:02:00    2
Freq: 30S, dtype: int64

Fill the missing value (NaN) forward using the bfill method:

>>> import pandas as pd
>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64
>>> 
>>> series.resample('30S').bfill()[0:5]
2000-01-01 00:00:00    0
2000-01-01 00:00:30    1
2000-01-01 00:01:00    1
2000-01-01 00:01:30    2
2000-01-01 00:02:00    2
Freq: 30S, dtype: int64

Pass the custom function through the apply method:

>>> import pandas as pd
>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64
>>> 
>>> def custom_resampler(array_like):
	return np.sum(array_like) + 5

>>> series.resample('3T').apply(custom_resampler)
2000-01-01 00:00:00     8
2000-01-01 00:03:00    17
2000-01-01 00:06:00    26
Freq: 3T, dtype: int64

Application of convention parameter:

>>> import pandas as pd
>>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01', freq='A', periods=2))
>>> s
2012    1
2013    2
Freq: A-DEC, dtype: int64
>>> 
>>> s.resample('Q', convention='start').asfreq()
2012Q1    1.0
2012Q2    NaN
2012Q3    NaN
2012Q4    NaN
2013Q1    2.0
2013Q2    NaN
2013Q3    NaN
2013Q4    NaN
Freq: Q-DEC, dtype: float64
>>> 
>>> s.resample('Q', convention='end').asfreq()
2012Q4    1.0
2013Q1    NaN
2013Q2    NaN
2013Q3    NaN
2013Q4    2.0
Freq: Q-DEC, dtype: float64

>>> import pandas as pd
>>> q = pd.Series([1, 2, 3, 4], index=pd.period_range('2018-01-01', freq='Q', periods=4))
>>> q
2018Q1    1
2018Q2    2
2018Q3    3
2018Q4    4
Freq: Q-DEC, dtype: int64
>>> 
>>> q.resample('M', convention='end').asfreq()
2018-03    1.0
2018-04    NaN
2018-05    NaN
2018-06    2.0
2018-07    NaN
2018-08    NaN
2018-09    3.0
2018-10    NaN
2018-11    NaN
2018-12    4.0
Freq: M, dtype: float64
>>> 
>>> q.resample('M', convention='start').asfreq()
2018-01    1.0
2018-02    NaN
2018-03    NaN
2018-04    2.0
2018-05    NaN
2018-06    NaN
2018-07    3.0
2018-08    NaN
2018-09    NaN
2018-10    4.0
2018-11    NaN
2018-12    NaN
Freq: M, dtype: float64

For DataFrame objects, you can use the keyword on to specify a column in the original data as the row index of the resampled data:

>>> import pandas as pd
>>> d = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19],
          	  'volume': [50, 60, 40, 100, 50, 100, 40, 50]})
>>> df = pd.DataFrame(d)
>>> df['week_starting'] = pd.date_range('01/01/2018', periods=8, freq='W')
>>> df
   price  volume week_starting
0     10      50    2018-01-07
1     11      60    2018-01-14
2      9      40    2018-01-21
3     13     100    2018-01-28
4     14      50    2018-02-04
5     18     100    2018-02-11
6     17      40    2018-02-18
7     19      50    2018-02-25
>>> 
>>> df.resample('M', on='week_starting').mean()
               price  volume
week_starting               
2018-01-31     10.75    62.5
2018-02-28     17.00    60.0

For a DataFrame object with a multi index, you can use the keyword level to specify at which level you want to resample:

>>> import pandas as pd
>>> days = pd.date_range('1/1/2000', periods=4, freq='D')
>>> d2 = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19],
           	   'volume': [50, 60, 40, 100, 50, 100, 40, 50]})
>>> df2 = pd.DataFrame(d2, index=pd.MultiIndex.from_product([days, ['morning', 'afternoon']]))
>>> df2
                      price  volume
2000-01-01 morning       10      50
           afternoon     11      60
2000-01-02 morning        9      40
           afternoon     13     100
2000-01-03 morning       14      50
           afternoon     18     100
2000-01-04 morning       17      40
           afternoon     19      50
>>> 
>>> df2.resample('D', level=0).sum()
            price  volume
2000-01-01     21     110
2000-01-02     22     140
2000-01-03     32     150
2000-01-04     36      90

This is an anti crawler text, please ignore.
This paper was first published in CSDN by TRHX.
Blog homepage: https://itrhx.blog.csdn.net/
Link to this article: https://itrhx.blog.csdn.net/article/details/106947061
 Unauthorized, no reprint! Reprint maliciously at your own risk! Respect originality and keep away from plagiarism!

Topics: Python Mobile Django github

Programmer Think

Pandas of Python data analysis: time series

Article catalog

[01x00] time series

[02x00] Timestamp

[02x01]pandas.Timestamp

[02x02] value of freq frequency

[02x03]to_datetime

[02x04]date_range

[02x05] index and section

[02x06] mobile data and data offset

[02x07] time zone processing

[03x00] fixed period

[03x01]pandas.Period

[03x02]period_range

[03x03] asfreq period frequency conversion

[03x04]to_period and to_timestamp()

[04x00] timedelta interval

[04x01]pandas.Timedelta

[04x02]to_timedelta

[04x03]timedelta_range

[05x00] resampling and frequency conversion

Hot Topics