TCN concept + origin + principle + code implementation of time domain / time convolution network for advanced machine learning

Posted by cheekychop on Fri, 07 Jan 2022 02:40:23 +0100

TCN from "ABA ABA" to "balabalabala"

  • The concept of TCN (why? What problems can be solved)
  • TCN's parents (origin)
  • Introduction to the principle of TCN
  • Code!

1. What is TCN (time domain convolution network, time convolution network) and what can it do

  • Main application directions:

Time series prediction, probability prediction, time prediction and traffic prediction

2. Origin of TCN

ps: before understanding TCN, you need to have a certain understanding of CNN and RNN.

  • Handling problems:

It is a network structure that can process time series data. Under specific conditions, the effect is better than the traditional neural network (RNN, CNN, etc.).

3. Introduction to the principle of TCN

Network structure of TCN

1, The network structure of TCN is mainly composed of the above figure. This paper is divided into two parts: the left and the right. The first is the left

Dilated Causal Conv ---> WeightNorm--->ReLU--->Dropout--->Dilated Causal Conv ---> WeightNorm--->ReLU--->Dropout

Obviously, this can be divided into

(Dilated Causal Conv ---> WeightNorm--->ReLU--->Dropout)*2

ok, let's explain these four one by one. If you have any idea, you can choose to skip

1,Dilated Gausal Conv

Dilated causal convolution

Inflation causal convolution can be divided into inflation, causality and convolution.

Convolution refers to the convolution in CNN, which refers to a sliding operation performed by the convolution kernel on the data;

Dilation means that the input of convolution is allowed to have interval sampling, which is similar to the stripe in convolution neural network, but also has obvious differences

Picture Description:

Causality refers to the data at time t in layer I, which only depends on the influence of time t and its previous values in layer (i-1). Causal convolution can abandon the reading of future data during training. It is a strict time constrained model.

Picture Description:

(ps: expansion convolution is not added)

2,WeightNorm

Normalize Weights

Normalize the weight value. If you want to carefully study the normalization process & normalization formula, you can click link Learning

click

advantage:

1. Small time overhead and fast operation speed!

2. Introduce less noise

3. WeightNorm is accelerated by rewriting the weight of the deep network without introducing the dependence on minibatch

3,ReLU()

A method of activation function

advantage:

1. It can make the training speed of the network faster

2. Increase the nonlinearity of the network and improve the expression ability of the model

3. Prevent the gradient from disappearing,

4. Make the network sparse, etc

Formula:

Overview diagram:

4,Dropout()

Dropout means that in the training process of deep learning network, neural network units are temporarily discarded from the network according to a certain probability.

Advantages: prevent over fitting and improve the operation speed of the model

2, Finally, the right - residual connection:

On the right is a 1 * 1 convolution block, which can not only make the network have the function of transmitting information across layers, but also ensure the consistency of input and output.

3, Advantages of TCN:

1. Parallelism

2. Gradient disappearance and gradient explosion can be avoided to a great extent

3. The receptive field is larger and more information is learned

4. Zero coding

import os
import sys
import paddle
import paddle.nn as nn
import numpy as np
import pandas as pd
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc
import paddle.nn.functional as F
from paddle.nn.utils import weight_norm
from sklearn.preprocessing import MinMaxScaler
from pandas.plotting import register_matplotlib_converters
from sourceCode import TimeSeriesNetwork
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "../..")))

class Chomp1d(nn.Layer):
    def __init__(self, chomp_size):
        super(Chomp1d, self).__init__()
        self.chomp_size = chomp_size

    def forward(self, x):
        return x[:, :, :-self.chomp_size]


class TemporalBlock(nn.Layer):
    def __init__(self,
                 n_inputs,
                 n_outputs,
                 kernel_size,
                 stride,
                 dilation,
                 padding,
                 dropout=0.2):
        super(TemporalBlock, self).__init__()
        self.conv1 = weight_norm(
            nn.Conv1D(
                n_inputs,
                n_outputs,
                kernel_size,
                stride=stride,
                padding=padding,
                dilation=dilation))
        # Chomp1d is used to make sure the network is causal.
        # We pad by (k-1)*d on the two sides of the input for convolution,
        # and then use Chomp1d to remove the (k-1)*d output elements on the right.
        self.chomp1 = Chomp1d(padding)
        self.relu1 = nn.ReLU()
        self.dropout1 = nn.Dropout(dropout)

        self.conv2 = weight_norm(
            nn.Conv1D(
                n_outputs,
                n_outputs,
                kernel_size,
                stride=stride,
                padding=padding,
                dilation=dilation))
        self.chomp2 = Chomp1d(padding)
        self.relu2 = nn.ReLU()
        self.dropout2 = nn.Dropout(dropout)

        self.net = nn.Sequential(self.conv1, self.chomp1, self.relu1,
                                 self.dropout1, self.conv2, self.chomp2,
                                 self.relu2, self.dropout2)
        self.downsample = nn.Conv1D(n_inputs, n_outputs,
                                    1) if n_inputs != n_outputs else None
        self.relu = nn.ReLU()
        self.init_weights()

    def init_weights(self):
        self.conv1.weight.set_value(
            paddle.tensor.normal(0.0, 0.01, self.conv1.weight.shape))
        self.conv2.weight.set_value(
            paddle.tensor.normal(0.0, 0.01, self.conv2.weight.shape))
        if self.downsample is not None:
            self.downsample.weight.set_value(
                paddle.tensor.normal(0.0, 0.01, self.downsample.weight.shape))

    def forward(self, x):
        out = self.net(x)
        res = x if self.downsample is None else self.downsample(x) # Make input equal to output
        return self.relu(out + res)


class TCNEncoder(nn.Layer):
    def __init__(self, input_size, num_channels, kernel_size=2, dropout=0.2):
        # input_size: enter the expected number of features
        # num_channels: number of channels
        # kernel_size: convolution kernel size
        super(TCNEncoder, self).__init__()
        self._input_size = input_size
        self._output_dim = num_channels[-1]

        layers = nn.LayerList()
        num_levels = len(num_channels)
        # print('print num_channels: ', num_channels)
        # print('print num_levels: ',num_levels)
        # exit(0)
        for i in range(num_levels):
            dilation_size = 2 ** i
            in_channels = input_size if i == 0 else num_channels[i - 1]
            out_channels = num_channels[i]
            layers.append(
                TemporalBlock(
                    in_channels,
                    out_channels,
                    kernel_size,
                    stride=1,
                    dilation=dilation_size,
                    padding=(kernel_size - 1) * dilation_size,
                    dropout=dropout))

        self.network = nn.Sequential(*layers)

    def get_input_dim(self):
        return self._input_size



    def get_output_dim(self):
        return self._output_dim

    def forward(self, inputs):
        inputs_t = inputs.transpose([0, 2, 1])
        output = self.network(inputs_t).transpose([2, 0, 1])[-1]
        return output


class TimeSeriesNetwork(nn.Layer):

    def __init__(self, input_size, next_k=1, num_channels=[256]):
        super(TimeSeriesNetwork, self).__init__()

        self.last_num_channel = num_channels[-1]

        self.tcn = TCNEncoder(
            input_size=input_size,
            num_channels=num_channels,
            kernel_size=3,
            dropout=0.2
        )

        self.linear = nn.Linear(in_features=self.last_num_channel, out_features=next_k)

    def forward(self, x):
        tcn_out = self.tcn(x)
        y_pred = self.linear(tcn_out)
        return y_pred
'''
I try to portray myself as the hero in the tragedy,
Put all the blame on you,
Make you an evil witch,
become frenzied
 But I am a normal person,
Sad and happy,
Wrong and right,
At this point,
We all have a responsibility,
Until now, I don't think I've lost you
 Did you tell me I lost you?
'''
def config_mtp():
    sns.set(style='whitegrid', palette='muted', font_scale=1.2)
    HAPPY_COLORS_PALETTE = ["#01BEFE", "#FFDD00", "#FF7D00", "#FF006D", "#93D30C", "#8F00FF"]
    sns.set_palette(sns.color_palette(HAPPY_COLORS_PALETTE))
    rcParams['figure.figsize'] = 14, 10
    register_matplotlib_converters()

def read_data():
    df_all = pd.read_csv('./data/time_series_covid19_confirmed_global.csv')
    # print(df_all.head())

    # We will predict the number of cases in the world, so we don't need to care about the longitude and latitude of specific countries, just the number of global cases on specific dates.

    df = df_all.iloc[:, 4:]
    daily_cases = df.sum(axis=0)
    daily_cases.index = pd.to_datetime(daily_cases.index)
    # print(daily_cases.head())

    plt.figure(figsize=(5, 5))
    plt.plot(daily_cases)
    plt.title("Cumulative daily cases")
    # plt.show()

    # In order to improve the stationarity of the sample time series, the first-order difference is taken
    daily_cases = daily_cases.diff().fillna(daily_cases[0]).astype(np.int64)
    # print(daily_cases.head())

    plt.figure(figsize=(5, 5))
    plt.plot(daily_cases)
    plt.title("Daily cases")
    plt.xticks(rotation=60)
    plt.show()
    return daily_cases

def create_sequences(data, seq_length):
    xs = []
    ys = []
    for i in range(len(data) - seq_length + 1):
        x = data[i:i + seq_length - 1]
        y = data[i + seq_length - 1]
        xs.append(x)
        ys.append(y)
    return np.array(xs), np.array(ys)

def preprocess_data(daily_cases):
    TEST_DATA_SIZE,SEQ_LEN = 30,10
    TEST_DATA_SIZE = int(TEST_DATA_SIZE/100*len(daily_cases))
    # TEST_DATA_SIZE=30, the last 30 data are used as test sets for prediction
    train_data = daily_cases[:-TEST_DATA_SIZE]
    test_data = daily_cases[-TEST_DATA_SIZE:]
    print("The number of the samples in train set is : %i" % train_data.shape[0])
    print(train_data.shape, test_data.shape)

    # In order to improve the convergence speed and performance of the model, we use scikit learn for data normalization.
    scaler = MinMaxScaler()
    train_data = scaler.fit_transform(np.expand_dims(train_data, axis=1)).astype('float32')
    test_data = scaler.transform(np.expand_dims(test_data, axis=1)).astype('float32')

    # Build time series
    # The number of cases in the first 10 days can be used to predict the number of cases in that day. In order to make all the data in the test set participate in the prediction, we will supplement a small amount of data to the test set, which will only be used as the input of the model.
    x_train, y_train = create_sequences(train_data, SEQ_LEN)
    test_data = np.concatenate((train_data[-SEQ_LEN + 1:], test_data), axis=0)
    x_test, y_test = create_sequences(test_data, SEQ_LEN)

    # Try output
    '''
    print("The shape of x_train is: %s"%str(x_train.shape))
    print("The shape of y_train is: %s"%str(y_train.shape))
    print("The shape of x_test is: %s"%str(x_test.shape))
    print("The shape of y_test is: %s"%str(y_test.shape))
    '''
    return x_train,y_train,x_test,y_test,scaler

# After the data set is processed, the data set is encapsulated into CovidDataset for model training and prediction.
class CovidDataset(paddle.io.Dataset):
    def __init__(self, feature, label):
        self.feature = feature
        self.label = label
        super(CovidDataset, self).__init__()

    def __len__(self):
        return len(self.label)

    def __getitem__(self, index):
        return [self.feature[index], self.label[index]]

def parameter():
    LR = 1e-2

    model = paddle.Model(network)

    optimizer = paddle.optimizer.Adam(
        learning_rate=LR, parameters=model.parameters())

    loss = paddle.nn.MSELoss(reduction='sum')
    model.prepare(optimizer, loss)



config_mtp()
data = read_data()
x_train,y_train,x_test,y_test,scaler = preprocess_data(data)
train_dataset = CovidDataset(x_train, y_train)
test_dataset = CovidDataset(x_test, y_test)
network = TimeSeriesNetwork(input_size=1)

# Parameter configuration
LR = 1e-2

model = paddle.Model(network)

optimizer = paddle.optimizer.Adam(learning_rate=LR, parameters=model.parameters()) # optimizer

loss = paddle.nn.MSELoss(reduction='sum')
model.prepare(optimizer, loss) # Configure the model before running

# train
USE_GPU = False
TRAIN_EPOCH = 100
LOG_FREQ = 20
SAVE_DIR = os.path.join(os.getcwd(),"save_dir")
SAVE_FREQ = 20

if USE_GPU:
    paddle.set_device("gpu")
else:
    paddle.set_device("cpu")

model.fit(train_dataset,
    batch_size=32,
    drop_last=True,
    epochs=TRAIN_EPOCH,
    log_freq=LOG_FREQ,
    save_dir=SAVE_DIR,
    save_freq=SAVE_FREQ,
    verbose=1 # The verbosity mode, should be 0, 1, or 2.   0 = silent, 1 = progress bar, 2 = one line per epoch. Default: 2.
    )




# forecast
preds = model.predict(
        test_data=test_dataset
        )

# Data post-processing, convert the normalized data into the original data, and draw the curve corresponding to the real value and the curve corresponding to the predicted value.
true_cases = scaler.inverse_transform(
    np.expand_dims(y_test.flatten(), axis=0)
).flatten()

predicted_cases = scaler.inverse_transform(
  np.expand_dims(np.array(preds).flatten(), axis=0)
).flatten()
print(true_cases.shape, predicted_cases.shape)
# print (type(data))
# print(data[1:3])
# print (len(data), len(data))
# print(data.index[:len(data)])
mse_loss = paddle.nn.MSELoss(reduction='mean')
print(paddle.sqrt(mse_loss(paddle.to_tensor(true_cases), paddle.to_tensor(predicted_cases))))

print(true_cases, predicted_cases)

If you need data, please comment below, and you can also get it by private mail.

Don't forget to like, comment and collect. It's really important to me~

Topics: network Algorithm Machine Learning AI NLP