The two common data types are datetime and timedelta.
As the name suggests, datetime means both date and time, indicating a specific point in time (timestamp). 2021-12-23 (date) 12:00:00 (time) ---> datetime
Timedelta indicates the difference between two time points. For example, the timedelta between 2020-01-01 and 2020-01-02 is one day.
Convert time column to time format
data.trade_date.head() 0 20190102 1 20190103 2 20190104 3 20190107 4 20190108 Name: trade_date, dtype: object # Using PD to_ Datetime(), which can convert the corresponding column to datetime64 in Pandas for later processing data["trade_date"] = pd.to_datetime(data.trade_date) data.trade_date.head() 0 2019-01-02 1 2019-01-03 2 2019-01-04 3 2019-01-07 4 2019-01-08 Name: trade_date, dtype: datetime64[ns]
Index of time series
#The index in time series is similar to the ordinary index of Pandas, which is called most of the time loc[index,columns] for the corresponding index data1 = data.set_index("trade_date") # June 2019 data data1.loc["2019-06"].head() close open high low trade_date 2019-06-03 2890.0809 2901.7424 2920.8292 2875.9019 2019-06-04 2862.2803 2887.6405 2888.3861 2851.9728 2019-06-05 2861.4181 2882.9369 2888.7676 2858.5719 2019-06-06 2827.7978 2862.3327 2862.3327 2822.1853 2019-06-10 2852.1302 2833.0145 2861.1310 2824.3554 # Data from June 2019 to August 2019 data1.loc["2019-06":"2019-08"].tail() close open high low trade_date 2019-08-26 2863.5673 2851.0158 2870.4939 2849.2381 2019-08-27 2902.1932 2879.5154 2919.6444 2879.4060 2019-08-28 2893.7564 2901.6267 2905.4354 2887.0115 2019-08-29 2890.9192 2895.9991 2898.6046 2878.5878 2019-08-30 2886.2365 2907.3825 2914.5767 2874.1028
Extract time / date attributes
In the process of time series data processing, the following requirements often need to be realized:
- Find the number of weeks corresponding to a date (2019-06-05 is the week)
- Judge the day of the week (2020-01-01 is the day of the week)
- Judge which quarter a date is (which quarter 2019-07-08 belongs to)
When the time column in the data (trade_date column in this data) has been converted to datetime64 format, you only need to call dt interface, you can quickly get the desired results, which are listed in the table below Common properties provided by dt interface
Property | Description |
year | particular year |
month | month |
day | day |
hour | Time |
minute | branch |
second | second |
date | date |
time | time |
dayofyear | What day of the year |
weekofyear | What week of the year |
weekday | The day of the week, 0 corresponding to Monday |
weekday_name | The name of the day of the week, such as Monday |
quarter | In what quarter |
is_leap_year | Determine whether it is a leap year |
# What day of the year data.trade_date.dt.dayofweek[0] 2 # Return corresponding date data.trade_date.dt.date[0] datetime.date(2019, 1, 2) # Return weeks data.trade_date.dt.weekofyear[0] 1 # Return day of week data.trade_date.dt.weekday_name[0] 'Wednesday'
resample
resample translates to resampling, which is described in the official documents
resample() is a time-based groupbyTime based groupby operation is the most important function in Pandas time series processing.
According to whether the sampling is from low frequency to high frequency or from high frequency to low frequency, it can be divided into up sampling and down sampling.
Downsampling
As an example, the data we use is the daily level data of Shanghai stock index in 2019. What should we do if we want to find the average closing price of each quarter now?
Calculating quarterly level data from daily level data is an aggregation operation from high frequency to low frequency. In fact, it is similar to group by operating quarterly. It is written in resample
data.resample('Q',on='trade_date')["close"].mean() trade_date 2019-03-31 2792.941622 2019-06-30 3010.354672 2019-09-30 2923.136748 2019-12-31 2946.752270 Freq: Q-DEC, Name: close, dtype: float64
Where 'Q' is sampled quarterly, and on specifies the datetime column (if the index is Datetimeindex, on does not need to specify, and downsampling is performed according to the index by default). Among them, the frequency of segmentation can be any time frequency, quarterly Q, monthly M, week W, N day ND, or H and T. of course, if the frequency after segmentation is less than the original time frequency, it is the upsampling we will talk about below.
Liter sampling
When the sampling frequency is lower than the original frequency, it is called ascending sampling. Upsampling is a more fine-grained division of the original time granularity, so missing values will be generated during upsampling. Next, take the data from 2019-01-02 to 2019-01-03 and demonstrate it according to the frequency of 6H:
example close trade_date 2019-01-02 2465.2910 2019-01-03 2464.3628 example.resample('6H').asfreq() close trade_date 2019-01-02 00:00:00 2465.2910 2019-01-02 06:00:00 NaN 2019-01-02 12:00:00 NaN 2019-01-02 18:00:00 NaN 2019-01-03 00:00:00 2464.3628
Apply to the result after resample asfreq() will return the result at the new frequency. You can see that the missing value is generated after the upsampling. If you want to fill in missing values, you can use backward filling bfill() or fill forward Fill() method:
# Fill forward. The missing value is 2465.2910 example.resample('6H').ffill() close trade_date 2019-01-02 00:00:00 2465.2910 2019-01-02 06:00:00 2465.2910 2019-01-02 12:00:00 2465.2910 2019-01-02 18:00:00 2465.2910 2019-01-03 00:00:00 2464.3628 # Fill backward. The missing value is 2464.3628 example.resample('6H').bfill() close trade_date 2019-01-02 00:00:00 2465.2910 2019-01-02 06:00:00 2464.3628 2019-01-02 12:00:00 2464.3628 2019-01-02 18:00:00 2464.3628 2019-01-03 00:00:00 2464.3628