了解 Keras LSTM [英] Understanding Keras LSTMs

查看:29
本文介绍了了解 Keras LSTM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图调和我对 LSTM 的理解,并在

当我们考虑多元序列时,特征参数是否变得相关?例如同时建模两只金融股?

有状态的 LSTM

有状态 LSTM 是否意味着我们在批次运行之间保存单元内存值?如果是这种情况,batch_size 是 1,并且内存在训练运行之间重置,所以说它是有状态的有什么意义.我猜这与训练数据没有混洗有关,但我不确定如何混洗.

有什么想法吗?图片参考:

编辑 2:

对于学过Udacity的深度学习课程但仍然对time_step论证感到困惑的人,请看以下讨论:https://discussions.udacity.com/t/rnn-lstm-use-implementation/163169

更新:

结果证明 model.add(TimeDistributed(Dense(vocab_len))) 正是我想要的.下面是一个例子:https://github.com/sachinruk/ShakespeareBot

更新 2:

我在这里总结了我对 LSTM 的大部分理解:https://www.youtube.com/watch?v=ywinX5wgdEU

解决方案

首先,你选择很棒的教程(1,2) 开始.

Time-step 的含义:X.shape(描述数据形状)中的 Time-steps==3 表示有三个粉色框.由于在 Keras 中每一步都需要一个输入,因此绿色框的数量通常应等于红色框的数量.除非你破解了结构.

多对多 vs. 多对一:在 keras 中,在初始化 LSTMGRU 时有一个 return_sequences 参数SimpleRNN.当return_sequencesFalse(默认)时,则为多对一,如图所示.它的返回形状是(batch_size, hidden_​​unit_length),代表最后一个状态.当return_sequencesTrue时,则为多对多.它的返回形状是(batch_size, time_step, hidden_​​unit_length)

特征参数是否相关:特征参数意味着你的红框有多大"或每一步的输入维度是多少.如果您想根据 8 种市场信息进行预测,那么您可以使用 feature==8 生成数据.

Stateful:你可以查询源码代码.初始化状态时,如果stateful==True,则以上次训练的状态作为初始状态,否则生成新的状态.我还没有打开 stateful.但是,我不同意 batch_sizestateful==True 时只能为 1.

目前,您使用收集的数据生成数据.想象您的股票信息以流的形式出现,而不是等待一天收集所有序列,您希望在使用网络进行训练/预测的同时在线生成输入数据.如果您有 400 只股票共享同一个网络,那么您可以设置 batch_size==400.

I am trying to reconcile my understand of LSTMs and pointed out here in this post by Christopher Olah implemented in Keras. I am following the blog written by Jason Brownlee for the Keras tutorial. What I am mainly confused about is,

  1. The reshaping of the data series into [samples, time steps, features] and,
  2. The stateful LSTMs

Lets concentrate on the above two questions with reference to the code pasted below:

# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)

# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 1))
testX = numpy.reshape(testX, (testX.shape[0], look_back, 1))
########################
# The IMPORTANT BIT
##########################
# create and fit the LSTM network
batch_size = 1
model = Sequential()
model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(100):
    model.fit(trainX, trainY, nb_epoch=1, batch_size=batch_size, verbose=2, shuffle=False)
    model.reset_states()

Note: create_dataset takes a sequence of length N and returns a N-look_back array of which each element is a look_back length sequence.

What is Time Steps and Features?

As can be seen TrainX is a 3-D array with Time_steps and Feature being the last two dimensions respectively (3 and 1 in this particular code). With respect to the image below, does this mean that we are considering the many to one case, where the number of pink boxes are 3? Or does it literally mean the chain length is 3 (i.e. only 3 green boxes considered).

Does the features argument become relevant when we consider multivariate series? e.g. modelling two financial stocks simultaneously?

Stateful LSTMs

Does stateful LSTMs mean that we save the cell memory values between runs of batches? If this is the case, batch_size is one, and the memory is reset between the training runs so what was the point of saying that it was stateful. I'm guessing this is related to the fact that training data is not shuffled, but I'm not sure how.

Any thoughts? Image reference: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Edit 1:

A bit confused about @van's comment about the red and green boxes being equal. So just to confirm, does the following API calls correspond to the unrolled diagrams? Especially noting the second diagram (batch_size was arbitrarily chosen.):

Edit 2:

For people who have done Udacity's deep learning course and still confused about the time_step argument, look at the following discussion: https://discussions.udacity.com/t/rnn-lstm-use-implementation/163169

Update:

It turns out model.add(TimeDistributed(Dense(vocab_len))) was what I was looking for. Here is an example: https://github.com/sachinruk/ShakespeareBot

Update2:

I have summarised most of my understanding of LSTMs here: https://www.youtube.com/watch?v=ywinX5wgdEU

解决方案

First of all, you choose great tutorials(1,2) to start.

What Time-step means: Time-steps==3 in X.shape (Describing data shape) means there are three pink boxes. Since in Keras each step requires an input, therefore the number of the green boxes should usually equal to the number of red boxes. Unless you hack the structure.

many to many vs. many to one: In keras, there is a return_sequences parameter when your initializing LSTM or GRU or SimpleRNN. When return_sequences is False (by default), then it is many to one as shown in the picture. Its return shape is (batch_size, hidden_unit_length), which represent the last state. When return_sequences is True, then it is many to many. Its return shape is (batch_size, time_step, hidden_unit_length)

Does the features argument become relevant: Feature argument means "How big is your red box" or what is the input dimension each step. If you want to predict from, say, 8 kinds of market information, then you can generate your data with feature==8.

Stateful: You can look up the source code. When initializing the state, if stateful==True, then the state from last training will be used as the initial state, otherwise it will generate a new state. I haven't turn on stateful yet. However, I disagree with that the batch_size can only be 1 when stateful==True.

Currently, you generate your data with collected data. Image your stock information is coming as stream, rather than waiting for a day to collect all sequential, you would like to generate input data online while training/predicting with network. If you have 400 stocks sharing a same network, then you can set batch_size==400.

这篇关于了解 Keras LSTM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆