堆叠LSTM网络中每个LSTM层的输入是什么? [英] What's the input of each LSTM layer in a stacked LSTM network?

查看:831
本文介绍了堆叠LSTM网络中每个LSTM层的输入是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难理解堆叠LSTM网络中各层的输入输出流.假设我已经创建了一个堆叠的LSTM网络,如下所示:

I'm having some difficulty understanding the input-output flow of layers in stacked LSTM networks. Let's say i have created a stacked LSTM network like the one below:

# parameters
time_steps = 10
features = 2
input_shape = [time_steps, features]
batch_size = 32

# model
model = Sequential()
model.add(LSTM(64, input_shape=input_shape,  return_sequences=True))
model.add(LSTM(32,input_shape=input_shape))

我们的堆叠LSTM网络由2个LSTM层组成,分别具有64个和32个隐藏单元.在这种情况下,我们希望在每个时间步长,第一LSTM层-LSTM(64)-将作为输入传递给第二LSTM层-LSTM(32)-一个大小为[batch_size, time-step, hidden_unit_length]的向量,该向量表示隐藏的当前时间步的第一个LSTM层的状态.令我感到困惑的是:

where our stacked-LSTM network consists of 2 LSTM layers with 64 and 32 hidden units respectively. In this scenario, we expect that at each time-step the 1st LSTM layer -LSTM(64)- will pass as input to the 2nd LSTM layer -LSTM(32)- a vector of size [batch_size, time-step, hidden_unit_length], which would represent the hidden state of the 1st LSTM layer at the current time-step. What confuses me is:

  1. 第二层LSTM层-LSTM(32)-是否将大小为[batch_size, time-step, hidden_unit_length]的第一层-LSTM(64)-的隐藏状态作为X(t)接收(作为输入),并使其通过其自身的隐藏层网络-在这种情况下,由32个节点组成??
  2. 如果第一个为真,当第二个仅处理第一层的隐藏状态时,为什么第一个-LSTM(64)-和第二个-LSTM(32)-的input_shape相同?在我们的情况下,不应将input_shape设置为[32, 10, 64]吗?
  1. Does the 2nd LSTM layer -LSTM(32)- receives as X(t) (as input) the hidden state of the 1st layer -LSTM(64)- that has the size [batch_size, time-step, hidden_unit_length] and passes it through it's own hidden network - in this case consisting of 32 nodes-?
  2. If the first is true, why the input_shape of the 1st -LSTM(64)- and 2nd -LSTM(32)- is the same, when the 2nd only processes the hidden state of the 1st layer? Shouldn't in our case have input_shape set to be [32, 10, 64]?

我发现下面的LSTM可视化非常有用(在此处 ),但在堆叠式lstm网络上不会扩展:

I found the LSTM visualization below very helpful (found here) but it doesn't expand on stacked-lstm networks:

任何帮助将不胜感激. 谢谢!

Any help would be highly appreciated. Thanks!

推荐答案

input_shape仅对于第一层是必需的.后续层将上一层的输出作为输入(因此,它们的input_shape参数值将被忽略)

The input_shape is only required for the first layer. The subsequent layers take the output of previous layer as its input (as so their input_shape argument value is ignored)

下面的模型

model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32))

代表以下架构

您可以从model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_26 (LSTM)               (None, 5, 64)             17152     
_________________________________________________________________
lstm_27 (LSTM)               (None, 32)                12416     
=================================================================

替换行

model.add(LSTM(32))

model.add(LSTM(32, input_shape=(1000000, 200000)))

仍会为您提供相同的体系结构(使用model.summary()进行验证),因为input_shape被忽略,因为它将上一层的张量输出作为输入.

will still give you the same architecture (verify using model.summary()) because the input_shape is ignore as it takes as input the tensor output of the previous layer.

如果您需要一个序列来像下面的序列进行构架

And If you need a sequence to sequence architecture like below

您应该使用以下代码:

model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32, return_sequences=True))

应该返回模型

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_32 (LSTM)               (None, 5, 64)             17152     
_________________________________________________________________
lstm_33 (LSTM)               (None, 5, 32)             12416     
=================================================================

这篇关于堆叠LSTM网络中每个LSTM层的输入是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆