Temporal Convolutional Network (TCN)

Posted by devinemke on Tue, 08 Mar 2022 00:13:50 +0100

Basic structure of TCN

The time-domain convolution data proposed by temporal. Network in 2018 can be seen in detail paper.

1. Causal Convolution

Causal convolution is shown in the figure above. For the value of t-Time of the upper layer, it only depends on the value of t-Time of the lower layer and its previous value. The difference from the traditional convolution neural network is that causal convolution can not see the future data. It is a one-way structure, not two-way. In other words, there must be antecedents before there can be consequences. It is a strict time constraint model, so it is called causal convolution.

Dilated Convolution

Dilated Convolution is also called void convolution

Pure causal convolution still has the problem of traditional convolution neural network, that is, the length of modeling time is limited by the size of convolution kernel. If you want to get longer dependencies, you need to stack many layers linearly. In order to solve this problem, researchers proposed expansion convolution, as shown in the figure below.

Unlike the traditional convolution, the expansion convolution allows the input to be sampled at intervals during the convolution, and the sampling rate is determined by d d d) control. Bottom d = 1 d = 1 d=1 means that each point is sampled in the process of input, and the middle layer d = 2 d = 2 d=2 means that every two points are sampled as input during the input process. Generally speaking, the higher the level, the greater the value of d. Therefore, the expansion convolution makes the size of the effective window increase exponentially with the number of layers. By this method, the convolution network can use fewer layers and obtain a large receptive field.

Residual Connections

Residual connection is proved to be an effective method to train deep networks, which allows the network to transmit information in a cross layer manner.

In this paper, a residual block is constructed to replace the convolution layer. As shown in the figure above, a residual block contains two layers of convolution and nonlinear mapping, and WeightNorm and Dropout are added to each layer to regularize the network.

TCN summary

advantage

(1) Parallelism. When a sentence is given, TCN can process sentences in parallel without sequential processing like RNN.

(2) Flexible receptive field. The size of the receptive field of TCN is determined by the number of layers, the size of convolution kernel and expansion coefficient. It can be customized flexibly according to different tasks and different characteristics.

(3) Gradient stability. RNN often has the problems of gradient disappearance and gradient explosion, which is mainly caused by sharing parameters in different time periods. Like the traditional convolutional neural network, TCN does not have the problems of gradient disappearance and gradient explosion.

(4) Lower memory. When using RNN, it needs to save the information of each step, which will occupy a lot of memory. The convolution kernel of TCN is shared in one layer, and the memory usage is lower.

shortcoming

(1) TCN may not be so adaptable in transfer learning. This is because the amount of historical information required for model prediction may be different in different fields. Therefore, when a model is migrated from a problem requiring less memory information to a problem requiring longer memory, the performance of TCN may be very poor because its receptive field is not large enough.

(2) The TCN described in this paper is still a unidirectional structure. Pure unidirectional structures are still very useful in tasks such as speech recognition and speech synthesis. However, most texts use a two-way structure. TCN can be easily extended to bidirectional structure, just use the traditional convolution structure instead of causal convolution.

(3) After all, TCN is a variant of convolutional neural network. Although the use of extended convolution can expand the receptive field, it is still limited. Compared with Transformer, the feature that any length of relevant information can be obtained is still very poor. The application of TCN in text remains to be tested.

TCN application

MINST handwritten numeral classification

Multiple features correspond to one label, i.e. (xi1,xi2,xi3,... xin) - yi

Local environment:

Python 3.6
IDE:Pycharm

Library version:

keras 2.2.0
numpy  1.16.2
tensorflow  1.9.0

1. Download data set

MINST dataset

2. Create TCN Py, enter the following code

Code reference: Keras-based TCN

# TCN for minst data
from tensorflow.examples.tutorials.mnist import input_data
from keras.models import Model
from keras.layers import add, Input, Conv1D, Activation, Flatten, Dense


# Load data load data
def read_data(path):
    mnist = input_data.read_data_sets(path, one_hot=True)
    train_x, train_y = mnist.train.images.reshape(-1, 28, 28), mnist.train.labels,
    valid_x, valid_y = mnist.validation.images.reshape(-1, 28, 28), mnist.validation.labels,
    test_x, test_y = mnist.test.images.reshape(-1, 28, 28), mnist.test.labels
    return train_x, train_y, valid_x, valid_y, test_x, test_y


# Residual block
def ResBlock(x, filters, kernel_size, dilation_rate):
    r = Conv1D(filters, kernel_size, padding='same', dilation_rate=dilation_rate, activation='relu')(
        x)  # first convolution
    r = Conv1D(filters, kernel_size, padding='same', dilation_rate=dilation_rate)(r)  # Second convolution


    if x.shape[-1] == filters:
        shortcut = x
    else:
        shortcut = Conv1D(filters, kernel_size, padding='same')(x)  	# shortcut (shortcut)
    o = add([r, shortcut])
    # Activation function
    o = Activation('relu')(o)  
    return o


# Sequence Model
def TCN(train_x, train_y, valid_x, valid_y, test_x, test_y, classes, epoch):
    inputs = Input(shape=(28, 28))
    x = ResBlock(inputs, filters=32, kernel_size=3, dilation_rate=1)
    x = ResBlock(x, filters=32, kernel_size=3, dilation_rate=2)
    x = ResBlock(x, filters=16, kernel_size=3, dilation_rate=4)
    x = Flatten()(x)
    x = Dense(classes, activation='softmax')(x)
    model = Model(input=inputs, output=x)
    # View network structure
    model.summary()
    # Compile model
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    # Training model
    model.fit(train_x, train_y, batch_size=500, nb_epoch=epoch, verbose=2, validation_data=(valid_x, valid_y))
    # Assessment model
    pre = model.evaluate(test_x, test_y, batch_size=500, verbose=2)
    print('test_loss:', pre[0], '- test_acc:', pre[1])

# There are 10 MINST numbers from 0-9, i.e. 10 categories
classes = 10
epoch = 30
train_x, train_y, valid_x, valid_y, test_x, test_y = read_data('MNIST_data')
#print(train_x, train_y)

TCN(train_x, train_y, valid_x, valid_y, test_x, test_y, classes, epoch)

3. Results

test_loss: 0.05342669463425409 - test_acc: 0.987100002169609

Multiple labels

Multiple features correspond to multiple labels, such as (xi1,xi2,xi3,... xin) - (yi1, yi2)

Just modify the above code, reconstruct the training and test data, and set the corresponding input and output dimensions, parameters and other information.

Local environment:

Python 3.6
IDE:Pycharm

Library version:

keras 2.2.0
numpy  1.16.2
pandas 0.24.1
sklearn 0.20.1
tensorflow  1.9.0

Specific code:

# TCN for indoor location
import math
from tensorflow.examples.tutorials.mnist import input_data
from keras.models import Model
from keras.layers import add, Input, Conv1D, Activation, Flatten, Dense
import numpy as np
import pandas
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error

# Create timing data
def create_dataset(dataset, look_back=1):
	dataX, dataY = [], []
	for i in range(len(dataset)-look_back-1):
		a = dataset[i:(i+look_back), :]
		dataX.append(a)
		dataY.append(dataset[i + look_back, -2:])
	return np.array(dataX), np.array(dataY)

# Residual block
def ResBlock(x, filters, kernel_size, dilation_rate):
    r = Conv1D(filters, kernel_size, padding='same', dilation_rate=dilation_rate, activation='relu')(
        x)  # first convolution
    r = Conv1D(filters, kernel_size, padding='same', dilation_rate=dilation_rate)(r)  # Second convolution


    if x.shape[-1] == filters:
        shortcut = x
    else:
        shortcut = Conv1D(filters, kernel_size, padding='same')(x)  # shortcut (shortcut)
    o = add([r, shortcut])
    o = Activation('relu')(o)  # Activation function
    return o

# Sequence Model
def TCN(train_x, train_y, test_x, test_y, look_back, n_features, n_output, epoch):
    inputs = Input(shape=(look_back, n_features))
    x = ResBlock(inputs, filters=32, kernel_size=3, dilation_rate=1)
    x = ResBlock(x, filters=32, kernel_size=3, dilation_rate=2)
    x = ResBlock(x, filters=16, kernel_size=3, dilation_rate=4)
    x = Flatten()(x)
    x = Dense(n_output, activation='softmax')(x)
    model = Model(input=inputs, output=x)
    # View network structure
    model.summary()
    # Compile model
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    # Training model
    model.fit(train_x, train_y, batch_size=500, nb_epoch=epoch, verbose=2)
    # Assessment model
    pre = model.evaluate(test_x, test_y, batch_size=500, verbose=2)
    # print(pre)
    print('test_loss:', pre[0], '- test_acc:', pre[1])
 

# Common parameters
np.random.seed(7)
features = 24
output = 2
EPOCH = 30
look_back = 5

trainPath = '../data/train.csv'
testPath  = '../data/test.csv'

trainData = pandas.read_csv(trainPath, engine='python')
testData = pandas.read_csv(testPath, engine='python')

# features = 1
dataset = trainData.values
dataset = dataset.astype('float32')

datatestset = testData.values
datatestset = datatestset.astype('float32')
# print(dataset)

# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
datatestset = scaler.fit_transform(datatestset)

trainX, trainY = create_dataset(dataset, look_back)
testX, testY = create_dataset(datatestset, look_back)
# print(trainX)
print(len(trainX), len(testX))
print(testX.shape)
# reshape input to be [samples, time steps, features]
trainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], features))
testX = np.reshape(testX, (testX.shape[0], testX.shape[1], features))

# train_x, train_y, valid_x, valid_y, test_x, test_y = read_data('MNIST_data')
print(trainX, trainY)

TCN(trainX, trainY,  testX, testY, look_back, features, output, EPOCH)

To be continued