了解 Keras LSTM:批量大小和状态的作用 [英] Understanding Keras LSTMs: Role of Batch-size and Statefulness

查看:26
本文介绍了了解 Keras LSTM:批量大小和状态的作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来源

有几个来源解释了有状态/无状态 LSTM 以及我已经阅读过的 batch_size 的作用.我稍后会在我的帖子中提及它们:

[1] https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-网络-python-keras/

[2] https://machinelearningmastery.com/stateful-stateless-lstm-time-series-forecasting-python/

[3] http://philipperemy.github.io/keras-stateful-lstm/

[4] https://machinelearningmastery.com/use-different-batch-sizes-training-预测-python-keras/

还有其他 SO 线程,例如 了解 Keras LSTMKeras - 有状态与无状态 LSTMs 并没有完全解释我在寻找什么.


我的问题

我仍然不确定关于状态性和确定 batch_size 的任务的正确方法是什么.

我有大约 1000 个独立的时间序列(samples),每个序列的长度约为 600 天(timesteps)(实际上是可变长度,但我考虑过修剪数据到一个恒定的时间范围),每个时间步有 8 个特征(或 input_dim)(有些特征与每个样本相同,有些特征与每个样本相同).

输入形状 = (1000, 600, 8)

其中一个特征是我想要预测的特征,而其他特征(应该)支持对这个主特征"的预测.我将对 1000 个时间序列中的每一个都这样做.对此问题建模的最佳策略是什么?

输出形状 = (1000, 600, 1)


什么是批次?

来自 [4]:><块引用>

Keras 使用快速符号数学库作为后端,例如 TensorFlow 和 Theano.

使用这些库的一个缺点是,无论您是在训练网络还是进行预测,都必须预先定义数据的形状和大小并保持不变.

[…]

当您希望做出比批量大小更少的预测时,这确实会成为一个问题.例如,您可能会使用大批量获得最佳结果,但需要针对时间序列或序列问题之类的问题一次对一次观察进行预测.

在我看来,这听起来像是批处理"将沿着 timesteps 维度拆分数据.

然而,[3] 指出:

<块引用>

换句话说,无论何时训练或测试 LSTM,您首先必须构建形状为 nb_samples, timesteps, input_dim 的输入矩阵 X,其中您的批次大小除以 <代码>nb_samples.例如,如果 nb_samples=1024batch_size=64,这意味着您的模型将接收 64 个样本的块,计算每个输出(无论每个时间步数是多少)样本),平均梯度并传播它以更新参数向量.

深入研究 [1] 和 [4],Jason 是总是将他的时间序列拆分为几个仅包含 1 个时间步长的样本(在他的示例中完全确定序列中的下一个元素的前身).所以我认为批次实际上是沿着 samples 轴拆分的.(但是他的时间序列拆分方法对我来说对于长期依赖问题没有意义.)

结论

所以假设我选择 batch_size=10,这意味着在一个时期内,权重更新 1000/10 = 100 次,随机选择 10 个包含 600 x 8 值的完整时间序列,当稍后我想用模型进行预测,我将始终必须为它提供 10 个完整时间序列的批次(或使用来自 [4],将权重复制到具有不同 batch_size 的新模型.

理解了batch_size 的原则——但是仍然不知道什么是batch_size 的好值.以及如何确定它


有状态

KERAS 文档告诉我们

<块引用>

您可以将 RNN 层设置为有状态",这意味着为一个批次中的样本计算的状态将被重新用作下一个批次中样本的初始状态.

如果我将我的时间序列分成几个样本(就像在 [1] 和 [4]) 以便我想建模的依赖项跨越多个批次,或者批次跨越样本是否则相互关联,我可能需要一个有状态的网络,否则不需要.这是一个正确和完整的结论吗?

所以对于我的问题,我想我不需要有状态的网络.我将我的训练数据构建为形状 (samples, timesteps, features) 的 3D 数组,然后使用尚未确定的 batch_size 调用 model.fit.示例代码可能如下所示:

model = Sequential()model.add(LSTM(32, input_shape=(600, 8))) #(时间步长,特征)模型.添加(LSTM(32))模型.添加(LSTM(32))模型.添加(LSTM(32))模型.添加(密集(1,激活=线性"))model.compile(loss='mean_squared_error', 优化器='adam')model.fit(X, y, epochs=500,batch_size=batch_size,verbose=2)

解决方案

举个例子:

假设您有以下系列:1,2,3,4,5,6,...,100.您必须决定您的 lstm 将学习多少个时间步长,并以此重塑您的数据.如下图:

如果您决定 time_steps = 5,则必须以这种方式将时间序列重塑为样本矩阵:

<块引用>

1,2,3,4,5 -> 样本 1

2,3,4,5,6 -> 样本 2

3,4,5,6,7 -> 样本 3

等等...

通过这样做,您将得到一个形状矩阵(96 个样本 x 5 个时间步长)

这个矩阵应该被重塑为 (96 x 5 x 1),表明 Keras 你只有 1 个时间序列.如果您有更多的并行时间序列(如您的情况),则对每个时间序列执行相同的操作,因此您将以 n 个矩阵(每个时间序列一个)结束,每个矩阵的形状(96 个样本 x 5 个时间步长).

为了论证起见,假设您有 3 个时间序列.您应该将所有三个矩阵合并为一个形状的单个张量(96 个样本 x 5 个时间步长 x 3 个时间序列).本例中 lstm 的第一层是:

 模型 = Sequential()model.add(LSTM(32, input_shape=(5, 3)))

作为第一个参数的 32 完全取决于您.这意味着在每个时间点,您的 3 个时间序列将成为 32 个不同的变量作为输出空间.更容易将每个时间步视为一个完全连接的层,具有 3 个输入和 32 个输出,但与 FC 层的计算不同.

如果您要堆叠多个 lstm 层,请使用 return_sequences=True 参数,因此该层将输出整个预测序列,而不仅仅是最后一个值.

您的目标应该是您要预测的系列中的下一个值.

综合起来,假设您有以下时间序列:

时间序列 1(主):1,2,3,4,5,6,..., 100

时间序列 2(支持):2,4,6,8,10,12,..., 200

时间序列 3(支持):3,6,9,12,15,18,..., 300

创建输入和目标张量

<块引用>

x ->是

1,2,3,4,5 -> 6

2,3,4,5,6 -> 7

3,4,5,6,7 -> 8

重新格式化其余的时间序列,但忘记目标,因为您不想预测这些序列

创建模型

 模型 = Sequential()model.add(LSTM(32, input_shape=(5, 3), return_sequences=True)) # 输入是形状(5 时间步 x 3 时间序列),输出是形状(5 时间步 x 32 个变量)因为 return_sequences = Truemodel.add(LSTM(8)) # 输出是形状(1 个时间步长 x 8 个变量)因为 return_sequences = Falsemodel.add(Dense(1, activation='linear')) # 输出是(密集层上的 1 个时间步长 x 1 个输出单元).它与目标变量进行比较.

编译并训练.一个好的批量大小是 32.批量大小是您的样本矩阵被拆分以加快计算的大小.只是不要使用 statefull

Sources

There are several sources out there explaining stateful / stateless LSTMs and the role of batch_size which I've read already. I'll refer to them later in my post:

[1] https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/

[2] https://machinelearningmastery.com/stateful-stateless-lstm-time-series-forecasting-python/

[3] http://philipperemy.github.io/keras-stateful-lstm/

[4] https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/

And also other SO threads like Understanding Keras LSTMs and Keras - stateful vs stateless LSTMs which didn't fully explain what I'm looking for however.


My Problem

I am still not sure what is the correct approach for my task regarding statefulness and determining batch_size.

I have about 1000 independent time series (samples) that have a length of about 600 days (timesteps) each (actually variable length, but I thought about trimming the data to a constant timeframe) with 8 features (or input_dim) for each timestep (some of the features are identical to every sample, some individual per sample).

Input shape = (1000, 600, 8)

One of the features is the one I want to predict, while the others are (supposed to be) supportive for the prediction of this one "master feature". I will do that for each of the 1000 time series. What would be the best strategy to model this problem?

Output shape = (1000, 600, 1)


What is a Batch?

From [4]:

Keras uses fast symbolic mathematical libraries as a backend, such as TensorFlow and Theano.

A downside of using these libraries is that the shape and size of your data must be defined once up front and held constant regardless of whether you are training your network or making predictions.

[…]

This does become a problem when you wish to make fewer predictions than the batch size. For example, you may get the best results with a large batch size, but are required to make predictions for one observation at a time on something like a time series or sequence problem.

This sounds to me like a "batch" would be splitting the data along the timesteps-dimension.

However, [3] states that:

Said differently, whenever you train or test your LSTM, you first have to build your input matrix X of shape nb_samples, timesteps, input_dim where your batch size divides nb_samples. For instance, if nb_samples=1024 and batch_size=64, it means that your model will receive blocks of 64 samples, compute each output (whatever the number of timesteps is for every sample), average the gradients and propagate it to update the parameters vector.

When looking deeper into the examples of [1] and [4], Jason is always splitting his time series to several samples that only contain 1 timestep (the predecessor that in his example fully determines the next element in the sequence). So I think the batches are really split along the samples-axis. (However his approach of time series splitting doesn’t make sense to me for a long-term dependency problem.)

Conclusion

So let’s say I pick batch_size=10, that means during one epoch the weights are updated 1000 / 10 = 100 times with 10 randomly picked, complete time series containing 600 x 8 values, and when I later want to make predictions with the model, I’ll always have to feed it batches of 10 complete time series (or use solution 3 from [4], copying the weights to a new model with different batch_size).

Principles of batch_size understood – however still not knowing what would be a good value for batch_size. and how to determine it


Statefulness

The KERAS documentation tells us

You can set RNN layers to be 'stateful', which means that the states computed for the samples in one batch will be reused as initial states for the samples in the next batch.

If I’m splitting my time series into several samples (like in the examples of [1] and [4]) so that the dependencies I’d like to model span across several batches, or the batch-spanning samples are otherwise correlated with each other, I may need a stateful net, otherwise not. Is that a correct and complete conclusion?

So for my problem I suppose I won’t need a stateful net. I’d build my training data as a 3D array of the shape (samples, timesteps, features) and then call model.fit with a batch_size yet to determine. Sample code could look like:

model = Sequential()
model.add(LSTM(32, input_shape=(600, 8)))   # (timesteps, features)
model.add(LSTM(32))
model.add(LSTM(32))
model.add(LSTM(32))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X, y, epochs=500, batch_size=batch_size, verbose=2)

解决方案

Let me explain it via an example:

So let's say you have the following series: 1,2,3,4,5,6,...,100. You have to decide how many timesteps your lstm will learn, and reshape your data as so. Like below:

if you decide time_steps = 5, you have to reshape your time series as a matrix of samples in this way:

1,2,3,4,5 -> sample1

2,3,4,5,6 -> sample2

3,4,5,6,7 -> sample3

etc...

By doing so, you will end with a matrix of shape (96 samples x 5 timesteps)

This matrix should be reshape as (96 x 5 x 1) indicating Keras that you have just 1 time series. If you have more time series in parallel (as in your case), you do the same operation on each time series, so you will end with n matrices (one for each time series) each of shape (96 sample x 5 timesteps).

For the sake of argument, let's say you 3 time series. You should concat all of three matrices into one single tensor of shape (96 samples x 5 timeSteps x 3 timeSeries). The first layer of your lstm for this example would be:

    model = Sequential()
    model.add(LSTM(32, input_shape=(5, 3)))

The 32 as first parameter is totally up to you. It means that at each point in time, your 3 time series will become 32 different variables as output space. It is easier to think each time step as a fully conected layer with 3 inputs and 32 outputs but with a different computation than FC layers.

If you are about stacking multiple lstm layers, use return_sequences=True parameter, so the layer will output the whole predicted sequence rather than just the last value.

your target shoud be the next value in the series you want to predict.

Putting all together, let say you have the following time series:

Time series 1 (master): 1,2,3,4,5,6,..., 100

Time series 2 (support): 2,4,6,8,10,12,..., 200

Time series 3 (support): 3,6,9,12,15,18,..., 300

Create the input and target tensor

x     -> y

1,2,3,4,5 -> 6

2,3,4,5,6 -> 7

3,4,5,6,7 -> 8

reformat the rest of time series, but forget about the target since you don't want to predict those series

Create your model

    model = Sequential()
    model.add(LSTM(32, input_shape=(5, 3), return_sequences=True)) # Input is shape (5 timesteps x 3 timeseries), output is shape (5 timesteps x 32 variables) because return_sequences  = True
    model.add(LSTM(8))  # output is shape (1 timesteps x 8 variables) because return_sequences = False
    model.add(Dense(1, activation='linear')) # output is (1 timestep x 1 output unit on dense layer). It is compare to target variable.

Compile it and train. A good batch size is 32. Batch size is the size your sample matrices are splited for faster computation. Just don't use statefull

这篇关于了解 Keras LSTM:批量大小和状态的作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆