了解Keras LSTM:批处理大小和状态性的作用 [英] Understanding Keras LSTMs: Role of Batch-size and Statefulness

查看:403
本文介绍了了解Keras LSTM:批处理大小和状态性的作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有很多资料解释有状态/无状态LSTM以及我已经读过的batch_size的作用.我将在我的帖子中稍后提及它们:

There are several sources out there explaining stateful / stateless LSTMs and the role of batch_size which I've read already. I'll refer to them later in my post:

[ 1 ] https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural- networks-python-keras/

[ 2 ] [ 3 ] [ 4 ] https://machinelearningmastery.com/use-different-batch-sizes-training-预言python-keras/

还有其他SO线程,例如了解Keras LSTM Keras-有状态与无状态LSTM ,但是并没有完全解释我要寻找的内容.

Ans also other SO threads like Understanding Keras LSTMs and Keras - stateful vs stateless LSTMs which didn't fully explain what I'm looking for however.

对于不确定状态和确定batch_size的任务,我仍然不确定哪种正确的方法.

I am still not sure what is the correct approach for my task regarding statefulness and determining batch_size.

我有大约1000个独立的时间序列(samples),每个时间序列的长度约为600天(timesteps)(实际上是可变长度,但我考虑过将数据修整为恒定的时间范围),并具有8个功能(或input_dim)在每个时间步(某些功能与每个样本相同,每个样本一些).

I have about 1000 independent time series (samples) that have a length of about 600 days (timesteps) each (actually variable length, but I thought about trimming the data to a constant timeframe) with 8 features (or input_dim) for each timestep (some of the features are identical to every sample, some individual per sample).

Input shape = (1000, 600, 8)

其中一个功能是我要预测的功能,而其他功能(应该是)支持对该主要功能"的预测.我将针对1000个时间序列中的每个时间序列执行此操作.对这个问题进行建模的最佳策略是什么?

One of the features is the one I want to predict, while the others are (supposed to be) supportive for the prediction of this one "master feature". I will do that for each of the 1000 time series. What would be the best strategy to model this problem?

Output shape = (1000, 600, 1)

来自[ 4 ]:

Keras使用快速符号数学库作为后端,例如TensorFlow和Theano.

Keras uses fast symbolic mathematical libraries as a backend, such as TensorFlow and Theano.

使用这些库的不利之处在于,无论您是在训练网络还是进行预测,都必须一次定义好数据的形状和大小并保持不变.

A downside of using these libraries is that the shape and size of your data must be defined once up front and held constant regardless of whether you are training your network or making predictions.

[…]

当您希望做出的预测少于批次大小时,这确实成为问题.例如,如果批次数量较大,您可能会获得最佳结果,但需要对诸如时间序列或序列问题之类的一次观测进行预测.

This does become a problem when you wish to make fewer predictions than the batch size. For example, you may get the best results with a large batch size, but are required to make predictions for one observation at a time on something like a time series or sequence problem.

对我来说,这听起来像一个批处理"将沿timesteps维度拆分数据.

This sounds to me like a "batch" would be splitting the data along the timesteps-dimension.

但是,[ 3 ]指出:

以不同的方式讲,每当训练或测试LSTM时,首先必须构建形状为nb_samples, timesteps, input_dim的输入矩阵X,其中批量大小将其划分为nb_samples.例如,如果nb_samples=1024batch_size=64,则意味着您的模型将接收64个样本的块,计算每个输出(无论每个样本的时间步长是多少),平均梯度并将其传播以更新参数向量

Said differently, whenever you train or test your LSTM, you first have to build your input matrix X of shape nb_samples, timesteps, input_dim where your batch size divides nb_samples. For instance, if nb_samples=1024 and batch_size=64, it means that your model will receive blocks of 64 samples, compute each output (whatever the number of timesteps is for every sample), average the gradients and propagate it to update the parameters vector.

深入研究[ 1 ]和[ 4 ],Jason是总是将他的时间序列拆分为几个仅包含1个时间步长的样本(在他的示例中,前任完全确定了序列中的下一个元素).因此,我认为批次实际上是沿samples轴拆分的. (但是,对于我的长期依赖问题,他的时间序列拆分方法对我来说没有意义.)

When looking deeper into the examples of [1] and [4], Jason is always splitting his time series to several samples that only contain 1 timestep (the predecessor that in his example fully determines the next element in the sequence). So I think the batches are really split along the samples-axis. (However his approach of time series splitting doesn’t make sense to me for a long-term dependency problem.)

结论

因此,假设我选择了batch_size=10,这意味着在一个时期内,权重被更新1000/10 = 100次,其中随机选择了10个,完整的时间序列包含600 x 8个值,而当我以后想进行预测时,模型,我将始终必须为它提供10个完整时间序列的批次(或使用[

So let’s say I pick batch_size=10, that means during one epoch the weights are updated 1000 / 10 = 100 times with 10 randomly picked, complete time series containing 600 x 8 values, and when I later want to make predictions with the model, I’ll always have to feed it batches of 10 complete time series (or use solution 3 from [4], copying the weights to a new model with different batch_size).

了解batch_size的原理– 但是仍然不知道batch_size的好值.以及如何确定

Principles of batch_size understood – however still not knowing what would be a good value for batch_size. and how to determine it

KERAS文档告诉我们

您可以将RNN图层设置为有状态",这意味着针对一批中的样本计算出的状态将被用作下一批次中的样本的初始状态.

You can set RNN layers to be 'stateful', which means that the states computed for the samples in one batch will be reused as initial states for the samples in the next batch.

如果我将时间序列分为几个samples(例如[

If I’m splitting my time series into several samples (like in the examples of [1] and [4]) so that the dependencies I’d like to model span across several batches, or the batch-spanning samples are otherwise correlated with each other, I may need a stateful net, otherwise not. Is that a correct and complete conclusion?

因此,对于我的问题,我想我不需要有状态的网络.我将训练数据构建为形状为(samples, timesteps, features)的3D数组,然后使用尚未确定的batch_size调用model.fit.示例代码如下:

So for my problem I suppose I won’t need a stateful net. I’d build my training data as a 3D array of the shape (samples, timesteps, features) and then call model.fit with a batch_size yet to determine. Sample code could look like:

model = Sequential()
model.add(LSTM(32, input_shape=(600, 8)))   # (timesteps, features)
model.add(LSTM(32))
model.add(LSTM(32))
model.add(LSTM(32))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X, y, epochs=500, batch_size=batch_size, verbose=2)

推荐答案

让我通过示例进行解释:

Let me explain it via an example:

因此,假设您有以下系列:1,2,3,4,5,6,...,100.您必须确定您的lstm将学习多少时间步长,并据此重塑数据.如下所示:

So let's say you have the following series: 1,2,3,4,5,6,...,100. You have to decide how many timesteps your lstm will learn, and reshape your data as so. Like below:

如果您决定time_steps = 5,则必须通过以下方式将时间序列重塑为样本矩阵:

if you decide time_steps = 5, you have to reshape your time series as a matrix of samples in this way:

1,2,3,4,5->样本1

1,2,3,4,5 -> sample1

2,3,4,5,6->样本2

2,3,4,5,6 -> sample2

3,4,5,6,7->样本3

3,4,5,6,7 -> sample3

等...

这样做,您将得到一个形状矩阵(96个样本x 5个时间步长)

By doing so, you will end with a matrix of shape (96 samples x 5 timesteps)

此矩阵的形状应重新设置为(96 x 5 x 1),表明Keras您只有1个时间序列.如果您有更多并行的时间序列(如您的情况),则对每个时间序列执行相同的操作,因此将以每种形状(96个样本x 5个时间步长)的n个矩阵(每个时间序列一个)结束.

This matrix should be reshape as (96 x 5 x 1) indicating Keras that you have just 1 time series. If you have more time series in parallel (as in your case), you do the same operation on each time series, so you will end with n matrices (one for each time series) each of shape (96 sample x 5 timesteps).

为了争辩,我们假设您有3个时间序列.您应该将所有三个矩阵合并为一个单一的张量形状(96个样本x 5 timeSteps x 3 timeSeries).此示例的lstm的第一层是:

For the sake of argument, let's say you 3 time series. You should concat all of three matrices into one single tensor of shape (96 samples x 5 timeSteps x 3 timeSeries). The first layer of your lstm for this example would be:

    model = Sequential()
    model.add(LSTM(32, input_shape=(5, 3)))

32作为第一个参数完全由您决定.这意味着在每个时间点,您的3个时间序列将成为32个不同的变量作为输出空间.更容易将每个时间步骤视为具有3个输入和32个输出的完全连接的层,但计算方式与FC层不同.

The 32 as first parameter is totally up to you. It means that at each point in time, your 3 time series will become 32 different variables as output space. It is easier to think each time step as a fully conected layer with 3 inputs and 32 outputs but with a different computation than FC layers.

如果要堆叠多个lstm层,请使用return_sequences = True参数,这样该层将输出整个预测序列,而不仅仅是最后一个值.

If you are about stacking multiple lstm layers, use return_sequences=True parameter, so the layer will output the whole predicted sequence rather than just the last value.

您的目标应该是您要预测的序列中的下一个值.

your target shoud be the next value in the series you want to predict.

放在一起,假设您具有以下时间序列:

Putting all together, let say you have the following time series:

时间序列1(主):1、2、3、4、5、6,...,100

Time series 1 (master): 1,2,3,4,5,6,..., 100

时间序列2(支持):2、4、6、8、10、12,...,200

Time series 2 (support): 2,4,6,8,10,12,..., 200

时间序列3(支持):3,6,9,12,15,18,...,300

Time series 3 (support): 3,6,9,12,15,18,..., 300

创建输入张量和目标张量

Create the input and target tensor

x     -> y

1,2,3,4,5-> 6

1,2,3,4,5 -> 6

2,3,4,5,6-> 7

2,3,4,5,6 -> 7

3,4,5,6,7-> 8

3,4,5,6,7 -> 8

重新格式化其余的时间序列,但是由于您不想预测那些序列而忘记了目标

reformat the rest of time series, but forget about the target since you don't want to predict those series

创建模型

    model = Sequential()
    model.add(LSTM(32, input_shape=(5, 3), return_sequences=True)) # Input is shape (5 timesteps x 3 timeseries), output is shape (5 timesteps x 32 variables) because return_sequences  = True
    model.add(LSTM(8))  # output is shape (1 timesteps x 8 variables) because return_sequences = False
    model.add(Dense(1, activation='linear')) # output is (1 timestep x 1 output unit on dense layer). It is compare to target variable.

编译并训练.好的批处理大小是32.批处理大小是将样本矩阵拆分以加快计算速度的大小.只是不要使用statefull

Compile it and train. A good batch size is 32. Batch size is the size your sample matrices are splited for faster computation. Just don't use statefull

这篇关于了解Keras LSTM:批处理大小和状态性的作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆