Time series stationarity test (ADF) and white noise test (Ljung box)

Posted by Gonwee on Wed, 09 Feb 2022 20:54:16 +0100

Before forecasting the time series, we need to carry out a series of tests on the data, mainly to test the stability and randomness of the data (white noise test). This paper mainly introduces ADF test and Ljung box test

ADF test

ADF test, i.e. unit root test, refers to whether there is a unit root in the test series, because the existence of a unit root is a non-stationary time series. Unit root refers to the unit root process. It can be proved that if there is a unit root in the sequence, the process is unstable, which will lead to pseudo regression in regression analysis.

The python code of ADF verification is given below

from statsmodels.tsa.stattools import adfuller
import pandas as pd
import numpy as np
data = pd.Series([151.0, 188.46, 199.38, 219.75, 241.55, 262.58, 328.22, 396.26, 442.04, 517.77, 626.52, 717.08, 824.38, 913.38, 1088.39, 1325.83, 1700.92, 2109.38, 2499.77, 2856.47, 3114.02, 3229.29, 3545.39, 3880.53, 4212.82, 4757.45, 5633.24, 6590.19, 7617.47, 9333.4, 11328.92, 12961.1, 15967.61],index=np.arange(1978,2011))

(-0.04391111656553118, 0.9547464774274733, 10, 22, {'1%': -3.769732625845229, '5%': -3.005425537190083, '10%': -2.6425009917355373}, 291.54354258641223)

The results are analyzed as follows:
-0.04391111656553118 is the result of adt test, referred to as t value for short, representing T statistics.
0.9547464774274733 is abbreviated as p-value, which represents the probability value corresponding to t-statistic.
10 indicates delay.
22 indicates the number of tests.
T he fifth is the value of critical ADF test under 99%, 95% and 90% confidence intervals.
291.54354258641223 maximum hysteresis threshold

Firstly, - 0.04391111656553118 is greater than the critical value of three confidence intervals, that is, there is a unit root.

Secondly, the p value is required to be less than the given significance level (generally 0.05), and it is best to be equal to 0. In this data, the p value is 0.9547464774274733, which is greater than 0.05, that is, there is a unit root.

So to sum up, this sequence is not a stationary sequence

The results of stationary series are given below
(-4.924087490679005, 3.129856642757301e-05, 19, 636, {'1%': -3.4406737255613256, '5%': -2.866095119842903, '10%': -2.5691958123689727}, 14356.744057311003)

T value is less than the critical value of three confidence intervals, and P value is less than 0.05, close to 0, so there is no unit root and it is a stationary sequence.

Ljung box test

Ljung box test, i.e. LB Test and randomness test, is used to test whether the autocorrelation of the sequence within the m-order lag range is significant or whether the sequence is white noise. The Q statistics obey the chi square distribution with degree of freedom M. If it is white noise data, the data has no value to extract, that is, there is no need to continue the analysis

The python code of Ljung box test is given below

from statsmodels.stats.diagnostic import acorr_ljungbox as lb_test
re = lb_test(data, lags=20)#Use blogger's own data
        lb_stat      lb_pvalue
1    471.099659  1.847036e-104
2    899.481638  4.786785e-196
3   1347.384204  7.695651e-292
4   1791.734228   0.000000e+00
5   2207.199800   0.000000e+00
6   2674.155719   0.000000e+00
7   3242.923906   0.000000e+00
8   3686.776794   0.000000e+00
9   4069.902008   0.000000e+00
10  4474.462678   0.000000e+00
11  4865.867510   0.000000e+00
12  5234.470249   0.000000e+00
13  5641.097308   0.000000e+00
14  6133.124076   0.000000e+00
15  6518.637784   0.000000e+00
16  6846.243758   0.000000e+00
17  7193.271970   0.000000e+00
18  7526.968985   0.000000e+00
19  7836.234889   0.000000e+00
20  8179.147428   0.000000e+00

The results are analyzed as follows

We mainly look at the p value of the second column. lags is the delay number of the test. It is generally specified as 20 or the sequence length. Each P value is less than 0.05 or equal to 0, indicating that the data is not white noise data. The data is valuable and can be analyzed further.

On the contrary, if it is greater than 0.05, it indicates that it is a white noise sequence and a pure random sequence.

Topics: Machine Learning Data Mining