python notes: tslearn (data part)

Posted by erth on Tue, 25 Jan 2022 17:53:26 +0100

tslearn is a Python package that provides machine learning tools for analyzing time series. This package builds on (and therefore relies on) the scikit learn, numpy, and scipy libraries.

1 time series data format

Use to_time_series function to generate time series data

from tslearn.utils import to_time_series

time_series_lst=[1,3,5,7,9]

time_series_tslearn=to_time_series(time_series_lst)

print(time_series_tslearn,'\n',type(time_series_tslearn),'\n',time_series_tslearn.shape)
'''
[[1.]
 [3.]
 [5.]
 [7.]
 [9.]] 

 <class 'numpy.ndarray'>
 
 (5, 1)
'''

It can be seen that in tslearn, the time series data is only a two-dimensional numpy array: the first dimension corresponds to the time axis and the second dimension is the characteristic dimension (1 in the above example).

1.1 multiple time series

If we want to manipulate time series sets, we can use to_time_series_dataset converts them into three-dimensional arrays.

If the time series in the set are not equal in size, the NaN value is appended to the shorter value, and the shape of the result array is (n_ts, max_sz, d), where max_ SZ is the maximum size of the time series in the set.

from tslearn.utils import to_time_series_dataset

time_series_lst_1=[1,3,5,7,9]
time_series_lst_2=[2,4,6,8]

time_series_tslearn=to_time_series_dataset([time_series_lst_1,
                                            time_series_lst_2])

time_series_tslearn,time_series_tslearn.shape,type(time_series_tslearn)
'''
(array([[[ 1.],
         [ 3.],
         [ 5.],
         [ 7.],
         [ 9.]],
 
        [[ 2.],
         [ 4.],
         [ 6.],
         [ 8.],
         [nan]]]),

 (2, 5, 1),

 numpy.ndarray)
'''

2 standard time series data (supplementary)

2.1 UCR_UEA data

Time series classification data set Time Series Classification Website

class tslearn.datasets.UCR_UEA_datasets()

tslearn.datasets.UCR_UEA_datasets — tslearn 0.5.2 documentation

2.2 CachedDataset

tslearn.datasets.CachedDatasets — tslearn 0.5.2 documentation

3 import time series data from text file

If you are importing other time series from a text file, the expected format is:

  • Each line represents a time series (the time series in the data set are not forced to have the same length);
  • In each line, the modes are separated by the '|' character. (if there is only one mode in your data, do not use it);
  • In each mode, observations are separated by a space character.

This is an example of a file that stores two time series with dimension 2 (the length of the first time series is 3 and the length of the second time series is 2).

1.0 0.0 2.5|3.0 2.0 1.0
1.0 2.0|4.333 2.12

The meaning of this expression is: there are two time series (each line is a time series)

Among them, the univariate time series of the first dimension of the first time series is' 1.0 0.0 2.5 '; The univariate time series in the second dimension is' 3.0 2.0 1.0 '

The univariate time series of the first dimension of the second time series is' 1.0 2.0 '; The univariate time series in the second dimension is' 4.333 2.12 '

The read effect is:

from tslearn.utils import save_time_series_txt, load_time_series_txt

time_series_read=load_time_series_txt('ts_time_series.txt')
time_series_read
'''
array([[[1.   , 3.   ],
        [0.   , 2.   ],
        [2.5  , 1.   ]],

       [[1.   , 4.333],
        [2.   , 2.12 ],
        [  nan,   nan]]])
'''

4 using data for training

After loading and formatting data according to tslearn standard, the next step is to provide data for machine learning model.

Most tslearn models inherit from the scikit learn base class, so their interaction with them is very similar to that with the scikit learn model.

        

from tslearn.clustering import TimeSeriesKMeans
km = TimeSeriesKMeans(n_clusters=3, metric="dtw")
#Indicates that kmeans use 3 as the number of categories for clustering, and the distance function is dtw
km.fit(time_series_read)

Topics: Python Back-end