Pandas data analysis - time related data types

Posted by ou812 on Sun, 16 Jan 2022 23:23:41 +0100

The two common data types are datetime and timedelta.

As the name suggests, datetime means both date and time, indicating a specific point in time (timestamp). 2021-12-23 (date) 12:00:00 (time)  ---> datetime

Timedelta indicates the difference between two time points. For example, the timedelta between 2020-01-01 and 2020-01-02 is one day.

Convert time column to time format

data.trade_date.head()


0    20190102
1    20190103
2    20190104
3    20190107
4    20190108
Name: trade_date, dtype: object


# Using PD to_ Datetime(), which can convert the corresponding column to datetime64 in Pandas for later processing
data["trade_date"] = pd.to_datetime(data.trade_date)
data.trade_date.head()


0   2019-01-02
1   2019-01-03
2   2019-01-04
3   2019-01-07
4   2019-01-08
Name: trade_date, dtype: datetime64[ns]

Index of time series

#The index in time series is similar to the ordinary index of Pandas, which is called most of the time loc[index,columns] for the corresponding index

data1 = data.set_index("trade_date")

# June 2019 data
data1.loc["2019-06"].head()


                close       open       high        low
trade_date
2019-06-03  2890.0809  2901.7424  2920.8292  2875.9019
2019-06-04  2862.2803  2887.6405  2888.3861  2851.9728
2019-06-05  2861.4181  2882.9369  2888.7676  2858.5719
2019-06-06  2827.7978  2862.3327  2862.3327  2822.1853
2019-06-10  2852.1302  2833.0145  2861.1310  2824.3554


# Data from June 2019 to August 2019
data1.loc["2019-06":"2019-08"].tail()

                close       open       high        low
trade_date
2019-08-26  2863.5673  2851.0158  2870.4939  2849.2381
2019-08-27  2902.1932  2879.5154  2919.6444  2879.4060
2019-08-28  2893.7564  2901.6267  2905.4354  2887.0115
2019-08-29  2890.9192  2895.9991  2898.6046  2878.5878
2019-08-30  2886.2365  2907.3825  2914.5767  2874.1028

Extract time / date attributes

In the process of time series data processing, the following requirements often need to be realized:

  • Find the number of weeks corresponding to a date (2019-06-05 is the week)
  • Judge the day of the week (2020-01-01 is the day of the week)
  • Judge which quarter a date is (which quarter 2019-07-08 belongs to)

When the time column in the data (trade_date column in this data) has been converted to datetime64 format, you only need to call dt interface, you can quickly get the desired results, which are listed in the table below Common properties provided by dt interface

PropertyDescription
yearparticular year
monthmonth
dayday
hourTime
minutebranch
secondsecond
datedate
timetime
dayofyearWhat day of the year
weekofyearWhat week of the year
weekdayThe day of the week, 0 corresponding to Monday
weekday_nameThe name of the day of the week, such as Monday
quarterIn what quarter
is_leap_yearDetermine whether it is a leap year
# What day of the year
data.trade_date.dt.dayofweek[0]

2

# Return corresponding date
data.trade_date.dt.date[0]

datetime.date(2019, 1, 2)

# Return weeks
data.trade_date.dt.weekofyear[0]

1

# Return day of week
data.trade_date.dt.weekday_name[0]

'Wednesday'

resample

resample translates to resampling, which is described in the official documents

resample() is a  time-based groupby

Time based groupby operation is the most important function in Pandas time series processing.

According to whether the sampling is from low frequency to high frequency or from high frequency to low frequency, it can be divided into up sampling and down sampling.

Downsampling

As an example, the data we use is the daily level data of Shanghai stock index in 2019. What should we do if we want to find the average closing price of each quarter now?

Calculating quarterly level data from daily level data is an aggregation operation from high frequency to low frequency. In fact, it is similar to group by operating quarterly. It is written in resample

data.resample('Q',on='trade_date')["close"].mean()


trade_date
2019-03-31    2792.941622
2019-06-30    3010.354672
2019-09-30    2923.136748
2019-12-31    2946.752270
Freq: Q-DEC, Name: close, dtype: float64

Where 'Q' is sampled quarterly, and on specifies the datetime column (if the index is Datetimeindex, on does not need to specify, and downsampling is performed according to the index by default). Among them, the frequency of segmentation can be any time frequency, quarterly Q, monthly M, week W, N day ND, or H and T. of course, if the frequency after segmentation is less than the original time frequency, it is the upsampling we will talk about below.  

Liter sampling

When the sampling frequency is lower than the original frequency, it is called ascending sampling. Upsampling is a more fine-grained division of the original time granularity, so missing values will be generated during upsampling. Next, take the data from 2019-01-02 to 2019-01-03 and demonstrate it according to the frequency of 6H:

example

                close
trade_date
2019-01-02  2465.2910
2019-01-03  2464.3628

example.resample('6H').asfreq()

                         close
trade_date
2019-01-02 00:00:00  2465.2910
2019-01-02 06:00:00        NaN
2019-01-02 12:00:00        NaN
2019-01-02 18:00:00        NaN
2019-01-03 00:00:00  2464.3628

Apply to the result after resample asfreq() will return the result at the new frequency. You can see that the missing value is generated after the upsampling. If you want to fill in missing values, you can use backward filling bfill() or fill forward Fill() method:

# Fill forward. The missing value is 2465.2910
example.resample('6H').ffill()

                         close
trade_date
2019-01-02 00:00:00  2465.2910
2019-01-02 06:00:00  2465.2910
2019-01-02 12:00:00  2465.2910
2019-01-02 18:00:00  2465.2910
2019-01-03 00:00:00  2464.3628

# Fill backward. The missing value is 2464.3628
example.resample('6H').bfill()

                         close
trade_date
2019-01-02 00:00:00  2465.2910
2019-01-02 06:00:00  2464.3628
2019-01-02 12:00:00  2464.3628
2019-01-02 18:00:00  2464.3628
2019-01-03 00:00:00  2464.3628

Topics: Python Data Analysis pandas