brief introduction
Time should be a data type often used in data processing. In addition to the two data types of datetime64 and timedelta64 in Numpy, pandas also integrates the functions of other python libraries, such as scikits.timeseries.
Time classification
There are four time types in pandas:
- Date times: date and time, with time zone. Similar to datetime.datetime in the standard library.
- Time delta: absolute duration, similar to datetime.timedelta in the standard library.
- Time spans: the time span defined by the time point and its associated frequency.
- Date offsets: the time calculated based on calendar is similar to dateutil.relativedelta.relativedelta.
We use a table to show:
type | Scalar class | Array class | pandas data type | Main creation method |
---|---|---|---|---|
Date times | Timestamp | DatetimeIndex | datetime64[ns] or datetime64[ns, tz] | to_datetime or date_range |
Time deltas | Timedelta | TimedeltaIndex | timedelta64[ns] | to_timedelta or timedelta_range |
Time spans | Period | PeriodIndex | period[freq] | Period or period_range |
Date offsets | DateOffset | None | None | DateOffset |
Let's look at an example:
In [19]: pd.Series(range(3), index=pd.date_range("2000", freq="D", periods=3)) Out[19]: 2000-01-01 0 2000-01-02 1 2000-01-03 2 Freq: D, dtype: int64
Take a look at the null value of the above data type:
In [24]: pd.Timestamp(pd.NaT) Out[24]: NaT In [25]: pd.Timedelta(pd.NaT) Out[25]: NaT In [26]: pd.Period(pd.NaT) Out[26]: NaT # Equality acts as np.nan would In [27]: pd.NaT == pd.NaT Out[27]: False
Timestamp
Timestamp is the most basic time type. We can create it as follows:
In [28]: pd.Timestamp(datetime.datetime(2012, 5, 1)) Out[28]: Timestamp('2012-05-01 00:00:00') In [29]: pd.Timestamp("2012-05-01") Out[29]: Timestamp('2012-05-01 00:00:00') In [30]: pd.Timestamp(2012, 5, 1) Out[30]: Timestamp('2012-05-01 00:00:00')
DatetimeIndex
Timestamp as an index will be automatically converted to DatetimeIndex:
In [33]: dates = [ ....: pd.Timestamp("2012-05-01"), ....: pd.Timestamp("2012-05-02"), ....: pd.Timestamp("2012-05-03"), ....: ] ....: In [34]: ts = pd.Series(np.random.randn(3), dates) In [35]: type(ts.index) Out[35]: pandas.core.indexes.datetimes.DatetimeIndex In [36]: ts.index Out[36]: DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None) In [37]: ts Out[37]: 2012-05-01 0.469112 2012-05-02 -0.282863 2012-05-03 -1.509059 dtype: float64
date_range and bdate_range
You can also use date_range to create DatetimeIndex:
In [74]: start = datetime.datetime(2011, 1, 1) In [75]: end = datetime.datetime(2012, 1, 1) In [76]: index = pd.date_range(start, end) In [77]: index Out[77]: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08', '2011-01-09', '2011-01-10', ... '2011-12-23', '2011-12-24', '2011-12-25', '2011-12-26', '2011-12-27', '2011-12-28', '2011-12-29', '2011-12-30', '2011-12-31', '2012-01-01'], dtype='datetime64[ns]', length=366, freq='D')
date_range is the calendar range, bdate_range is the working day range:
In [78]: index = pd.bdate_range(start, end) In [79]: index Out[79]: DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12', '2011-01-13', '2011-01-14', ... '2011-12-19', '2011-12-20', '2011-12-21', '2011-12-22', '2011-12-23', '2011-12-26', '2011-12-27', '2011-12-28', '2011-12-29', '2011-12-30'], dtype='datetime64[ns]', length=260, freq='B')
Both methods can take start, end, and periods parameters.
In [84]: pd.bdate_range(end=end, periods=20) In [83]: pd.date_range(start, end, freq="W") In [86]: pd.date_range("2018-01-01", "2018-01-05", periods=5)
origin
Use the origin parameter to modify the starting point of DatetimeIndex:
In [67]: pd.to_datetime([1, 2, 3], unit="D", origin=pd.Timestamp("1960-01-01")) Out[67]: DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'], dtype='datetime64[ns]', freq=None)
By default, origin='unix ', that is, the starting point is 1970-01-01 00:00:00
In [68]: pd.to_datetime([1, 2, 3], unit="D") Out[68]: DatetimeIndex(['1970-01-02', '1970-01-03', '1970-01-04'], dtype='datetime64[ns]', freq=None)
format
Use the format parameter to format the time:
In [51]: pd.to_datetime("2010/11/12", format="%Y/%m/%d") Out[51]: Timestamp('2010-11-12 00:00:00') In [52]: pd.to_datetime("12-11-2010 00:00", format="%d-%m-%Y %H:%M") Out[52]: Timestamp('2010-11-12 00:00:00')
Period
Period refers to a time span, which is usually used together with freq:
In [31]: pd.Period("2011-01") Out[31]: Period('2011-01', 'M') In [32]: pd.Period("2012-05", freq="D") Out[32]: Period('2012-05-01', 'D')
Period can be calculated directly:
In [345]: p = pd.Period("2012", freq="A-DEC") In [346]: p + 1 Out[346]: Period('2013', 'A-DEC') In [347]: p - 3 Out[347]: Period('2009', 'A-DEC') In [348]: p = pd.Period("2012-01", freq="2M") In [349]: p + 2 Out[349]: Period('2012-05', '2M') In [350]: p - 1 Out[350]: Period('2011-11', '2M')
Note that Period can only be counted if it has the same freq. Including offsets and timedelta
In [352]: p = pd.Period("2014-07-01 09:00", freq="H") In [353]: p + pd.offsets.Hour(2) Out[353]: Period('2014-07-01 11:00', 'H') In [354]: p + datetime.timedelta(minutes=120) Out[354]: Period('2014-07-01 11:00', 'H') In [355]: p + np.timedelta64(7200, "s") Out[355]: Period('2014-07-01 11:00', 'H')
Period as an index can be automatically converted to PeriodIndex:
In [38]: periods = [pd.Period("2012-01"), pd.Period("2012-02"), pd.Period("2012-03")] In [39]: ts = pd.Series(np.random.randn(3), periods) In [40]: type(ts.index) Out[40]: pandas.core.indexes.period.PeriodIndex In [41]: ts.index Out[41]: PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]', freq='M') In [42]: ts Out[42]: 2012-01 -1.135632 2012-02 1.212112 2012-03 -0.173215 Freq: M, dtype: float64
You can use pd.period_ Use the range method to create PeriodIndex:
In [359]: prng = pd.period_range("1/1/2011", "1/1/2012", freq="M") In [360]: prng Out[360]: PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06', '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12', '2012-01'], dtype='period[M]', freq='M')
You can also create it directly through PeriodIndex:
In [361]: pd.PeriodIndex(["2011-1", "2011-2", "2011-3"], freq="M") Out[361]: PeriodIndex(['2011-01', '2011-02', '2011-03'], dtype='period[M]', freq='M')
DateOffset
DateOffset represents a frequency object. It is very similar to Timedelta, which represents a duration, but has special calendar rules. For example, Timedelta must be 24 hours a day, while in DateOffset, there may be 23, 24 or 25 hours a day according to different daylight saving time.
# This particular day contains a day light savings time transition In [144]: ts = pd.Timestamp("2016-10-30 00:00:00", tz="Europe/Helsinki") # Respects absolute time In [145]: ts + pd.Timedelta(days=1) Out[145]: Timestamp('2016-10-30 23:00:00+0200', tz='Europe/Helsinki') # Respects calendar time In [146]: ts + pd.DateOffset(days=1) Out[146]: Timestamp('2016-10-31 00:00:00+0200', tz='Europe/Helsinki') In [147]: friday = pd.Timestamp("2018-01-05") In [148]: friday.day_name() Out[148]: 'Friday' # Add 2 business days (Friday --> Tuesday) In [149]: two_business_days = 2 * pd.offsets.BDay() In [150]: two_business_days.apply(friday) Out[150]: Timestamp('2018-01-09 00:00:00') In [151]: friday + two_business_days Out[151]: Timestamp('2018-01-09 00:00:00') In [152]: (friday + two_business_days).day_name() Out[152]: 'Tuesday'
DateOffsets and Frequency operations are closed first. Take a look at the available date offsets and their associated Frequency:
Date Offset | Frequency String | describe |
---|---|---|
DateOffset | None | Generic offset class |
BDay or BusinessDay | 'B' | weekdays |
CDay or CustomBusinessDay | 'C' | Custom working days |
Week | 'W' | a week |
WeekOfMonth | 'WOM' | The day of the week of each month |
LastWeekOfMonth | 'LWOM' | The day of the last week of each month |
MonthEnd | 'M' | Calendar month end |
MonthBegin | 'MS' | Beginning of calendar month |
BMonthEnd or BusinessMonthEnd | 'BM' | End of business month |
BMonthBegin or BusinessMonthBegin | 'BMS' | Beginning of business month |
CBMonthEnd or CustomBusinessMonthEnd | 'CBM' | Custom business month end |
CBMonthBegin or CustomBusinessMonthBegin | 'CBMS' | Custom business month beginning |
SemiMonthEnd | 'SM' | The 15th day at the end of the calendar month |
SemiMonthBegin | 'SMS' | The 15th day at the beginning of the calendar month |
QuarterEnd | 'Q' | Calendar quarter end |
QuarterBegin | 'QS' | Beginning of calendar quarter |
BQuarterEnd | 'BQ | End of working season |
BQuarterBegin | 'BQS' | Beginning of working season |
FY5253Quarter | 'REQ' | Retail season (52-53 week) |
YearEnd | 'A' | Calendar year end |
YearBegin | 'AS' or 'BYS' | Beginning of calendar year |
BYearEnd | 'BA' | End of business |
BYearBegin | 'BAS' | Beginning of business |
FY5253 | 'RE' | Retail year (aka 52-53 week) |
Easter | None | Easter holiday |
BusinessHour | 'BH' | business hour |
CustomBusinessHour | 'CBH' | custom business hour |
Day | 'D' | Absolute time of day |
Hour | 'H' | an hour |
Minute | 'T' or 'min' | One Minute |
Second | 'S' | One second |
Milli | 'L' or 'ms' | One subtle |
Micro | 'U' or 'us' | One millisecond |
Nano | 'N' | One nanosecond |
DateOffset also has two methods rollforward() and rollback() to move the time:
In [153]: ts = pd.Timestamp("2018-01-06 00:00:00") In [154]: ts.day_name() Out[154]: 'Saturday' # BusinessHour's valid offset dates are Monday through Friday In [155]: offset = pd.offsets.BusinessHour(start="09:00") # Bring the date to the closest offset date (Monday) In [156]: offset.rollforward(ts) Out[156]: Timestamp('2018-01-08 09:00:00') # Date is brought to the closest offset date first and then the hour is added In [157]: ts + offset Out[157]: Timestamp('2018-01-08 10:00:00')
The above operation will automatically save the hours, minutes and other information. If you want to set it to 00:00:00, you can call the normalize() method:
In [158]: ts = pd.Timestamp("2014-01-01 09:00") In [159]: day = pd.offsets.Day() In [160]: day.apply(ts) Out[160]: Timestamp('2014-01-02 09:00:00') In [161]: day.apply(ts).normalize() Out[161]: Timestamp('2014-01-02 00:00:00') In [162]: ts = pd.Timestamp("2014-01-01 22:00") In [163]: hour = pd.offsets.Hour() In [164]: hour.apply(ts) Out[164]: Timestamp('2014-01-01 23:00:00') In [165]: hour.apply(ts).normalize() Out[165]: Timestamp('2014-01-01 00:00:00') In [166]: hour.apply(pd.Timestamp("2014-01-01 23:30")).normalize() Out[166]: Timestamp('2014-01-02 00:00:00')
As index
Time can be used as an index, and there are some very convenient features when it is used as an index.
Time can be directly used to obtain corresponding data:
In [99]: ts["1/31/2011"] Out[99]: 0.11920871129693428 In [100]: ts[datetime.datetime(2011, 12, 25):] Out[100]: 2011-12-30 0.56702 Freq: BM, dtype: float64 In [101]: ts["10/31/2011":"12/31/2011"] Out[101]: 2011-10-31 0.271860 2011-11-30 -0.424972 2011-12-30 0.567020 Freq: BM, dtype: float64
Obtain annual data:
In [102]: ts["2011"] Out[102]: 2011-01-31 0.119209 2011-02-28 -1.044236 2011-03-31 -0.861849 2011-04-29 -2.104569 2011-05-31 -0.494929 2011-06-30 1.071804 2011-07-29 0.721555 2011-08-31 -0.706771 2011-09-30 -1.039575 2011-10-31 0.271860 2011-11-30 -0.424972 2011-12-30 0.567020 Freq: BM, dtype: float64
Get data for a month:
In [103]: ts["2011-6"] Out[103]: 2011-06-30 1.071804 Freq: BM, dtype: float64
DF can accept time as a parameter of loc:
In [105]: dft Out[105]: A 2013-01-01 00:00:00 0.276232 2013-01-01 00:01:00 -1.087401 2013-01-01 00:02:00 -0.673690 2013-01-01 00:03:00 0.113648 2013-01-01 00:04:00 -1.478427 ... ... 2013-03-11 10:35:00 -0.747967 2013-03-11 10:36:00 -0.034523 2013-03-11 10:37:00 -0.201754 2013-03-11 10:38:00 -1.509067 2013-03-11 10:39:00 -1.693043 [100000 rows x 1 columns] In [106]: dft.loc["2013"] Out[106]: A 2013-01-01 00:00:00 0.276232 2013-01-01 00:01:00 -1.087401 2013-01-01 00:02:00 -0.673690 2013-01-01 00:03:00 0.113648 2013-01-01 00:04:00 -1.478427 ... ... 2013-03-11 10:35:00 -0.747967 2013-03-11 10:36:00 -0.034523 2013-03-11 10:37:00 -0.201754 2013-03-11 10:38:00 -1.509067 2013-03-11 10:39:00 -1.693043 [100000 rows x 1 columns]
Time slice:
In [107]: dft["2013-1":"2013-2"] Out[107]: A 2013-01-01 00:00:00 0.276232 2013-01-01 00:01:00 -1.087401 2013-01-01 00:02:00 -0.673690 2013-01-01 00:03:00 0.113648 2013-01-01 00:04:00 -1.478427 ... ... 2013-02-28 23:55:00 0.850929 2013-02-28 23:56:00 0.976712 2013-02-28 23:57:00 -2.693884 2013-02-28 23:58:00 -1.575535 2013-02-28 23:59:00 -1.573517 [84960 rows x 1 columns]
Slice and match exactly
Consider the following Series object with a precision of:
In [120]: series_minute = pd.Series( .....: [1, 2, 3], .....: pd.DatetimeIndex( .....: ["2011-12-31 23:59:00", "2012-01-01 00:00:00", "2012-01-01 00:02:00"] .....: ), .....: ) .....: In [121]: series_minute.index.resolution Out[121]: 'minute'
If the time precision is less than minutes, a Series object is returned:
In [122]: series_minute["2011-12-31 23"] Out[122]: 2011-12-31 23:59:00 1 dtype: int64
If the time precision is greater than minutes, a constant is returned:
In [123]: series_minute["2011-12-31 23:59"] Out[123]: 1 In [124]: series_minute["2011-12-31 23:59:00"] Out[124]: 1
Similarly, if the precision is seconds, an object will be returned if it is less than seconds, and a constant value will be returned if it is equal to seconds.
Operation of time series
Shifting
Use the shift method to move the time series accordingly:
In [275]: ts = pd.Series(range(len(rng)), index=rng) In [276]: ts = ts[:5] In [277]: ts.shift(1) Out[277]: 2012-01-01 NaN 2012-01-02 0.0 2012-01-03 1.0 Freq: D, dtype: float64
By specifying freq, you can set the shift mode:
In [278]: ts.shift(5, freq="D") Out[278]: 2012-01-06 0 2012-01-07 1 2012-01-08 2 Freq: D, dtype: int64 In [279]: ts.shift(5, freq=pd.offsets.BDay()) Out[279]: 2012-01-06 0 2012-01-09 1 2012-01-10 2 dtype: int64 In [280]: ts.shift(5, freq="BM") Out[280]: 2012-05-31 0 2012-05-31 1 2012-05-31 2 dtype: int64
frequency conversion
The frequency of time series can be converted by calling asfreq:
In [281]: dr = pd.date_range("1/1/2010", periods=3, freq=3 * pd.offsets.BDay()) In [282]: ts = pd.Series(np.random.randn(3), index=dr) In [283]: ts Out[283]: 2010-01-01 1.494522 2010-01-06 -0.778425 2010-01-11 -0.253355 Freq: 3B, dtype: float64 In [284]: ts.asfreq(pd.offsets.BDay()) Out[284]: 2010-01-01 1.494522 2010-01-04 NaN 2010-01-05 NaN 2010-01-06 -0.778425 2010-01-07 NaN 2010-01-08 NaN 2010-01-11 -0.253355 Freq: B, dtype: float64
asfreq can also specify the filling method after modifying the frequency:
In [285]: ts.asfreq(pd.offsets.BDay(), method="pad") Out[285]: 2010-01-01 1.494522 2010-01-04 1.494522 2010-01-05 1.494522 2010-01-06 -0.778425 2010-01-07 -0.778425 2010-01-08 -0.778425 2010-01-11 -0.253355 Freq: B, dtype: float64
Resampling resampling
A given time series can be resampled by calling the resample method:
In [286]: rng = pd.date_range("1/1/2012", periods=100, freq="S") In [287]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng) In [288]: ts.resample("5Min").sum() Out[288]: 2012-01-01 25103 Freq: 5T, dtype: int64
resample can accept various statistical methods, such as sum, mean, std, sem, max, min, median, first, last, ohlc.
In [289]: ts.resample("5Min").mean() Out[289]: 2012-01-01 251.03 Freq: 5T, dtype: float64 In [290]: ts.resample("5Min").ohlc() Out[290]: open high low close 2012-01-01 308 460 9 205 In [291]: ts.resample("5Min").max() Out[291]: 2012-01-01 460 Freq: 5T, dtype: int64
This article has been included in http://www.flydean.com/15-python-pandas-time/
The most popular interpretation, the most profound dry goods, the most concise tutorial, and many tips you don't know are waiting for you to find!
Welcome to my official account: "those things in procedure", understand technology, know you better!