def y_func(y): return y.astype('float').mean(1)
Unwindowed datasets
Functionality that will allow you to create a dataset that applies sliding windows to the input data on the fly. This heavily reduces the size of the input data files, as only the original unwindowed data needs to be stored.
I’d like to thank both Thomas Capelle (https://github.com/tcapelle) and Xander Dunn (https://github.com/xanderdunn) for their contributions to make this code possible.
TSUnwindowedDatasets
TSUnwindowedDatasets (dataset, splits)
Base class for lists with subsets
TSUnwindowedDataset
TSUnwindowedDataset (X=None, y=None, y_func=None, window_size=1, stride=1, drop_start=0, drop_end=0, seq_first=True, **kwargs)
Initialize self. See help(type(self)) for accurate signature.
This approach works with both univariate and multivariate data.
- Univariate: we’ll use a simple array with 20 values, one with the seq_len first (X0), the other with seq_len second (X1).
- Multivariate: we’ll use 2 time series arrays, one with the seq_len first (X2), the other with seq_len second (X3). No sliding window has been applied to them yet.
# Univariate
= np.arange(20).astype(float)
X0 = np.arange(20).reshape(1, -1).astype(float)
X1 X0.shape, X0, X1.shape, X1
((20,),
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.,
13., 14., 15., 16., 17., 18., 19.]),
(1, 20),
array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.,
13., 14., 15., 16., 17., 18., 19.]]))
# Multivariate
= np.arange(20).reshape(-1,1)*np.array([1, 10, 100]).reshape(1,-1).astype(float)
X2 = np.arange(20).reshape(1,-1)*np.array([1, 10, 100]).reshape(-1,1).astype(float)
X3 X2.shape, X3.shape, X2, X3
((20, 3),
(3, 20),
array([[0.0e+00, 0.0e+00, 0.0e+00],
[1.0e+00, 1.0e+01, 1.0e+02],
[2.0e+00, 2.0e+01, 2.0e+02],
[3.0e+00, 3.0e+01, 3.0e+02],
[4.0e+00, 4.0e+01, 4.0e+02],
[5.0e+00, 5.0e+01, 5.0e+02],
[6.0e+00, 6.0e+01, 6.0e+02],
[7.0e+00, 7.0e+01, 7.0e+02],
[8.0e+00, 8.0e+01, 8.0e+02],
[9.0e+00, 9.0e+01, 9.0e+02],
[1.0e+01, 1.0e+02, 1.0e+03],
[1.1e+01, 1.1e+02, 1.1e+03],
[1.2e+01, 1.2e+02, 1.2e+03],
[1.3e+01, 1.3e+02, 1.3e+03],
[1.4e+01, 1.4e+02, 1.4e+03],
[1.5e+01, 1.5e+02, 1.5e+03],
[1.6e+01, 1.6e+02, 1.6e+03],
[1.7e+01, 1.7e+02, 1.7e+03],
[1.8e+01, 1.8e+02, 1.8e+03],
[1.9e+01, 1.9e+02, 1.9e+03]]),
array([[0.0e+00, 1.0e+00, 2.0e+00, 3.0e+00, 4.0e+00, 5.0e+00, 6.0e+00,
7.0e+00, 8.0e+00, 9.0e+00, 1.0e+01, 1.1e+01, 1.2e+01, 1.3e+01,
1.4e+01, 1.5e+01, 1.6e+01, 1.7e+01, 1.8e+01, 1.9e+01],
[0.0e+00, 1.0e+01, 2.0e+01, 3.0e+01, 4.0e+01, 5.0e+01, 6.0e+01,
7.0e+01, 8.0e+01, 9.0e+01, 1.0e+02, 1.1e+02, 1.2e+02, 1.3e+02,
1.4e+02, 1.5e+02, 1.6e+02, 1.7e+02, 1.8e+02, 1.9e+02],
[0.0e+00, 1.0e+02, 2.0e+02, 3.0e+02, 4.0e+02, 5.0e+02, 6.0e+02,
7.0e+02, 8.0e+02, 9.0e+02, 1.0e+03, 1.1e+03, 1.2e+03, 1.3e+03,
1.4e+03, 1.5e+03, 1.6e+03, 1.7e+03, 1.8e+03, 1.9e+03]]))
Now, instead of applying SlidingWindow to create and save the time series that can be consumed by a time series model, we can use a dataset that creates the data on the fly. In this way we avoid the need to create and save large files. This approach is also useful when you want to test different sliding window sizes, as otherwise you would need to create files for every size you want to test.The dataset will create the samples correctly formatted and ready to be passed on to a time series architecture.
= TSUnwindowedDataset(X0, window_size=5, stride=2, seq_first=True)[:][0]
wds0 = TSUnwindowedDataset(X1, window_size=5, stride=2, seq_first=False)[:][0]
wds1
test_eq(wds0, wds1) wds0, wds0.data, wds1, wds1.data
(TSTensor(samples:8, vars:1, len:5, device=cpu),
tensor([[[ 0., 1., 2., 3., 4.]],
[[ 2., 3., 4., 5., 6.]],
[[ 4., 5., 6., 7., 8.]],
[[ 6., 7., 8., 9., 10.]],
[[ 8., 9., 10., 11., 12.]],
[[10., 11., 12., 13., 14.]],
[[12., 13., 14., 15., 16.]],
[[14., 15., 16., 17., 18.]]]),
TSTensor(samples:8, vars:1, len:5, device=cpu),
tensor([[[ 0., 1., 2., 3., 4.]],
[[ 2., 3., 4., 5., 6.]],
[[ 4., 5., 6., 7., 8.]],
[[ 6., 7., 8., 9., 10.]],
[[ 8., 9., 10., 11., 12.]],
[[10., 11., 12., 13., 14.]],
[[12., 13., 14., 15., 16.]],
[[14., 15., 16., 17., 18.]]]))
= TSUnwindowedDataset(X2, window_size=5, stride=2, seq_first=True)[:][0]
wds2 = TSUnwindowedDataset(X3, window_size=5, stride=2, seq_first=False)[:][0]
wds3
test_eq(wds2, wds3) wds2, wds3, wds2.data, wds3.data
(TSTensor(samples:8, vars:3, len:5, device=cpu),
TSTensor(samples:8, vars:3, len:5, device=cpu),
tensor([[[0.0000e+00, 1.0000e+00, 2.0000e+00, 3.0000e+00, 4.0000e+00],
[0.0000e+00, 1.0000e+01, 2.0000e+01, 3.0000e+01, 4.0000e+01],
[0.0000e+00, 1.0000e+02, 2.0000e+02, 3.0000e+02, 4.0000e+02]],
[[2.0000e+00, 3.0000e+00, 4.0000e+00, 5.0000e+00, 6.0000e+00],
[2.0000e+01, 3.0000e+01, 4.0000e+01, 5.0000e+01, 6.0000e+01],
[2.0000e+02, 3.0000e+02, 4.0000e+02, 5.0000e+02, 6.0000e+02]],
[[4.0000e+00, 5.0000e+00, 6.0000e+00, 7.0000e+00, 8.0000e+00],
[4.0000e+01, 5.0000e+01, 6.0000e+01, 7.0000e+01, 8.0000e+01],
[4.0000e+02, 5.0000e+02, 6.0000e+02, 7.0000e+02, 8.0000e+02]],
[[6.0000e+00, 7.0000e+00, 8.0000e+00, 9.0000e+00, 1.0000e+01],
[6.0000e+01, 7.0000e+01, 8.0000e+01, 9.0000e+01, 1.0000e+02],
[6.0000e+02, 7.0000e+02, 8.0000e+02, 9.0000e+02, 1.0000e+03]],
[[8.0000e+00, 9.0000e+00, 1.0000e+01, 1.1000e+01, 1.2000e+01],
[8.0000e+01, 9.0000e+01, 1.0000e+02, 1.1000e+02, 1.2000e+02],
[8.0000e+02, 9.0000e+02, 1.0000e+03, 1.1000e+03, 1.2000e+03]],
[[1.0000e+01, 1.1000e+01, 1.2000e+01, 1.3000e+01, 1.4000e+01],
[1.0000e+02, 1.1000e+02, 1.2000e+02, 1.3000e+02, 1.4000e+02],
[1.0000e+03, 1.1000e+03, 1.2000e+03, 1.3000e+03, 1.4000e+03]],
[[1.2000e+01, 1.3000e+01, 1.4000e+01, 1.5000e+01, 1.6000e+01],
[1.2000e+02, 1.3000e+02, 1.4000e+02, 1.5000e+02, 1.6000e+02],
[1.2000e+03, 1.3000e+03, 1.4000e+03, 1.5000e+03, 1.6000e+03]],
[[1.4000e+01, 1.5000e+01, 1.6000e+01, 1.7000e+01, 1.8000e+01],
[1.4000e+02, 1.5000e+02, 1.6000e+02, 1.7000e+02, 1.8000e+02],
[1.4000e+03, 1.5000e+03, 1.6000e+03, 1.7000e+03, 1.8000e+03]]]),
tensor([[[0.0000e+00, 1.0000e+00, 2.0000e+00, 3.0000e+00, 4.0000e+00],
[0.0000e+00, 1.0000e+01, 2.0000e+01, 3.0000e+01, 4.0000e+01],
[0.0000e+00, 1.0000e+02, 2.0000e+02, 3.0000e+02, 4.0000e+02]],
[[2.0000e+00, 3.0000e+00, 4.0000e+00, 5.0000e+00, 6.0000e+00],
[2.0000e+01, 3.0000e+01, 4.0000e+01, 5.0000e+01, 6.0000e+01],
[2.0000e+02, 3.0000e+02, 4.0000e+02, 5.0000e+02, 6.0000e+02]],
[[4.0000e+00, 5.0000e+00, 6.0000e+00, 7.0000e+00, 8.0000e+00],
[4.0000e+01, 5.0000e+01, 6.0000e+01, 7.0000e+01, 8.0000e+01],
[4.0000e+02, 5.0000e+02, 6.0000e+02, 7.0000e+02, 8.0000e+02]],
[[6.0000e+00, 7.0000e+00, 8.0000e+00, 9.0000e+00, 1.0000e+01],
[6.0000e+01, 7.0000e+01, 8.0000e+01, 9.0000e+01, 1.0000e+02],
[6.0000e+02, 7.0000e+02, 8.0000e+02, 9.0000e+02, 1.0000e+03]],
[[8.0000e+00, 9.0000e+00, 1.0000e+01, 1.1000e+01, 1.2000e+01],
[8.0000e+01, 9.0000e+01, 1.0000e+02, 1.1000e+02, 1.2000e+02],
[8.0000e+02, 9.0000e+02, 1.0000e+03, 1.1000e+03, 1.2000e+03]],
[[1.0000e+01, 1.1000e+01, 1.2000e+01, 1.3000e+01, 1.4000e+01],
[1.0000e+02, 1.1000e+02, 1.2000e+02, 1.3000e+02, 1.4000e+02],
[1.0000e+03, 1.1000e+03, 1.2000e+03, 1.3000e+03, 1.4000e+03]],
[[1.2000e+01, 1.3000e+01, 1.4000e+01, 1.5000e+01, 1.6000e+01],
[1.2000e+02, 1.3000e+02, 1.4000e+02, 1.5000e+02, 1.6000e+02],
[1.2000e+03, 1.3000e+03, 1.4000e+03, 1.5000e+03, 1.6000e+03]],
[[1.4000e+01, 1.5000e+01, 1.6000e+01, 1.7000e+01, 1.8000e+01],
[1.4000e+02, 1.5000e+02, 1.6000e+02, 1.7000e+02, 1.8000e+02],
[1.4000e+03, 1.5000e+03, 1.6000e+03, 1.7000e+03, 1.8000e+03]]]))