There are 7 Python toolkits for time series prediction, and there is always one for you

Posted by pelegk2 on Fri, 03 Dec 2021 22:55:19 +0100

Welcome to pay attention to me, IT industry, focus on Python!

 

Time series problem is one of the most difficult problems in data science. Traditional processing methods such as ARIMA and SARIMA are very good, but it is difficult to achieve satisfactory prediction results when dealing with nonlinear or non-stationary time series problems.

In order to obtain better prediction results and complete tasks simply and efficiently, in this article, I will share with you seven Python toolkits for dealing with time series problems. I have gained some results, like support, and welcome to collect and learn.

1,tsfresh

tsfresh is a great python package. It can automatically calculate a large number of time series characteristics, including many feature extraction methods and powerful feature selection algorithms.

Let's take the standard data set of airline passengers as an example to understand tsfresh

# Importing libraries
import pandas as pd
from tsfresh import extract_features, extract_relevant_features, select_features
from tsfresh.utilities.dataframe_functions import impute, make_forecasting_frame
from tsfresh.feature_extraction import ComprehensiveFCParameters, settings

# Reading the data
data = pd.read_csv('../input/air-passengers/AirPassengers.csv')

# Some preprocessing for time component:
data.columns = ['month','Passengers']
data['month'] = pd.to_datetime(data['month'],infer_datetime_format=True,format='%y%m')
data.index = data.month
df_air = data.drop(['month'], axis = 1)

# Use Forecasting frame from tsfresh for rolling forecast training
df_shift, y_air = make_forecasting_frame(df_air["Passengers"], kind="Passengers", max_timeshift=12, rolling_direction=1)
print(df_shift)

 

The data needs to be formatted as follows:

# Getting Comprehensive Features
extraction_settings = ComprehensiveFCParameters()
X = extract_features(df_shift, column_id="id", column_sort="time", column_value="value", impute_function=impute,show_warnings=False,default_fc_parameters=extraction_settings)

 

From the above output, we can see that about 800 features have been created. Tsfresh also facilitates feature selection based on p values. For more details, see Github: https://github.com/blue-yonder/tsfresh
Official documents https://tsfresh.readthedocs.io/en/latest/index.html

2,autots

AutoTS is an automatic time series prediction library, which can train multiple time series models with simple code. Some of the best functions of this library include:

  • Genetic programming optimization method is used to find the optimal time series prediction model.
  • Provide the lower and upper limits of the predicted value of the confidence interval.
  • It trains a variety of models, such as statistical, machine learning and deep learning models
  • It can also perform automatic integration of the best models
  • It can also deal with chaotic data by learning optimal NaN interpolation and outlier removal
  • It can run univariate and multivariate time series

Let's take Apple stock dataset as an example to learn more about:

# Loading the package
from autots import AutoTS
import matplotlib.pyplot as plt
import pandas as pd

# Reading the data
df = pd.read_csv('../input/apple-aapl-historical-stock-data/HistoricalQuotes.csv')

# Doing some preprocessing
def remove_dollar(x):
    return x[2:]
df[' Close/Last'] = df[' Close/Last'].apply(remove_dollar)
df[' Close/Last']  = df[' Close/Last'].astype(float)
df['Date'] = pd.to_datetime(df['Date'])

# Plot to see the data:
df = df[["Date", " Close/Last"]]
df["Date"] = pd.to_datetime(df.Date)
temp_df = df.set_index('Date')
temp_df[" Close/Last"].plot(figsize=(12, 8), title="Apple Stock Prices", fontsize=20, label="Close Price")
plt.legend()
plt.grid()
plt.show()

model = AutoTS(forecast_length=40, frequency='infer', ensemble='simple', drop_data_older_than_periods=100)
model = model.fit(df, date_col='Date', value_col=' Close/Last', id_col=None)

  This will run hundreds of models. You will see the various models running in the output pane. Let's see how the model predicts:

prediction = model.predict()
forecast = prediction.forecast
print("Stock Price Prediction of Apple")
print(forecast)

temp_df[' Close/Last'].plot(figsize=(15,8), title= 'AAPL Stock Price', fontsize=18, label='Train')
forecast[' Close/Last'].plot(figsize=(15,8), title= 'AAPL Stock Price', fontsize=18, label='Test')
plt.legend()
plt.grid()
plt.show()

 

For more details, see Github: https://github.com/winedarksea/AutoTS
Official website documents: https://winedarksea.github.io/AutoTS/build/html/source/tutorial.html

3,Prophet

Prophet is a well-known time series software package developed by Facebook research team. It was first released in 2017 and is applicable to data with strong seasonal impact and historical data of multiple seasons. It is highly user-friendly and customizable with minimal setup.

Let's take a simple example:

# Loading the library
import pandas as pd
import matplotlib.pyplot as plt
from fbprophet import Prophet


# Loading the data from the repo:
df = pd.read_csv("https://raw.githubusercontent.com/facebook/prophet/master/examples/example_wp_log_peyton_manning.csv")

# Fitting the model
model = Prophet() 
model.fit(df) #fit the  model.

# Predict
future = model.make_future_dataframe(periods=730) # predicting for ~ 2 years
forecast = model.predict(future) # Predict future

# Plot results
fig1 = model.plot(forecast) # Plot the fit to past data and future forcast.
fig2 = model.plot_components(forecast) # Plot breakdown of components.
plt.show()
forecast # Displaying various results in table format.

The trend chart and seasonal chart are as follows:  

We can also see the prediction and all the confidence intervals

For more details, see Github: https://github.com/facebook/prophet

file: https://facebook.github.io/prophet/

4,darts:

Darts is another Python package that facilitates the operation and prediction of time series. The syntax is "sklearn friendly" and uses the fit and predict functions to achieve the goal. In addition, it also includes various models from ARIMA to neural network.

The best part of the package is that it supports not only univariate, but also multivariable time series and models. The library can also conveniently carry out backtracking test on models, and combine the prediction and external regression of multiple models. Let's take a simple example to understand its working principle:

#Loading the package
from darts import TimeSeries
from darts.models import ExponentialSmoothing
import matplotlib.pyplot as plt

# Reading the data
data = pd.read_csv('../input/air-passengers/AirPassengers.csv')
series = TimeSeries.from_dataframe(data, 'Month', '#Passengers')
print(series)

 

# Splitting the series in train and validation set
train, val = series.split_before(pd.Timestamp('19580101'))

# Applying a simple Exponential Smoothing model
model = ExponentialSmoothing()
model.fit(train)

# Getting and plotting the predictions
prediction = model.predict(len(val))series.plot(label='actual')
prediction.plot(label='forecast', lw=3)
plt.legend()

 

For more details, see Github: https://github.com/unit8co/darts
file: https://unit8co.github.io/darts/README.html  

5,AtsPy

AtsPy represents the automatic time series model in Python. The goal of the library is to predict univariate time series. You can load data and specify the model to run, as shown in the following example:

# Importing packages
import pandas as pd
from atspy import AutomatedModel

# Reading the data:
data = pd.read_csv('../input/air-passengers/AirPassengers.csv')

# Preprocessing data 
data.columns = ['month','Passengers']
data['month'] = pd.to_datetime(data['month'],infer_datetime_format=True,format='%y%m')
data.index = data.month
df_air = data.drop(['month'], axis = 1)

# Select the models you want to run:
models = ['ARIMA','Prophet']
run_models = AutomatedModel(df = df_air, model_list=models, forecast_len=10)

The package provides a set of different models that are fully automated. The following is a screenshot of the available models:

Github: https://github.com/firmai/atspy

6,kats:

Kats is another library recently developed by the Facebook research team to deal with time series data. The goal of the framework is to provide a complete solution to the time series problem. Using this library, we can do the following:

  • Time series analysis
  • Pattern detection, including seasonality, outliers and trend changes
  • Feature engineering module generating 65 features
  • Establish prediction models for time series data, including Prophet, ARIMA, Holt Winters, etc.

It has just released its first version. Some tutorials can be found here
https://github.com/facebookresearch/Kats/tree/master/tutorials

7,sktime:

Sktime library is a unified python library, which is suitable for time series data and compatible with scikit learn. It has time series prediction, regression and classification models. The main goal of development is to interoperate with scikit learn.

Take a prediction example to introduce the use of sktime

from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.theta import ThetaForecaster
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error

y = load_airline()
y_train, y_test = temporal_train_test_split(y)
fh = ForecastingHorizon(y_test.index, is_relative=False)
forecaster = ThetaForecaster(sp=12)  # monthly seasonal periodicity
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
mean_absolute_percentage_error(y_test, y_pred)
>>> 0.08661467738190656

 

Technical exchange

Welcome to reprint, collect, gain, praise and support!

 

 

Topics: Python Pycharm Machine Learning Data Analysis Data Mining