堆叠LSTM网络中每个LSTM层的输入是什么? [英] What's the input of each LSTM layer in a stacked LSTM network?
问题描述
我很难理解堆叠LSTM网络中各层的输入输出流.假设我已经创建了一个堆叠的LSTM网络,如下所示:
I'm having some difficulty understanding the input-output flow of layers in stacked LSTM networks. Let's say i have created a stacked LSTM network like the one below:
# parameters
time_steps = 10
features = 2
input_shape = [time_steps, features]
batch_size = 32
# model
model = Sequential()
model.add(LSTM(64, input_shape=input_shape, return_sequences=True))
model.add(LSTM(32,input_shape=input_shape))
我们的堆叠LSTM网络由2个LSTM层组成,分别具有64个和32个隐藏单元.在这种情况下,我们希望在每个时间步长,第一LSTM层-LSTM(64)-将作为输入传递给第二LSTM层-LSTM(32)-一个大小为[batch_size, time-step, hidden_unit_length]
的向量,该向量表示隐藏的当前时间步的第一个LSTM层的状态.令我感到困惑的是:
where our stacked-LSTM network consists of 2 LSTM layers with 64 and 32 hidden units respectively. In this scenario, we expect that at each time-step the 1st LSTM layer -LSTM(64)- will pass as input to the 2nd LSTM layer -LSTM(32)- a vector of size [batch_size, time-step, hidden_unit_length]
, which would represent the hidden state of the 1st LSTM layer at the current time-step. What confuses me is:
- 第二层LSTM层-LSTM(32)-是否将大小为
[batch_size, time-step, hidden_unit_length]
的第一层-LSTM(64)-的隐藏状态作为X(t)
接收(作为输入),并使其通过其自身的隐藏层网络-在这种情况下,由32个节点组成?? - 如果第一个为真,当第二个仅处理第一层的隐藏状态时,为什么第一个-LSTM(64)-和第二个-LSTM(32)-的
input_shape
相同?在我们的情况下,不应将input_shape
设置为[32, 10, 64]
吗?
- Does the 2nd LSTM layer -LSTM(32)- receives as
X(t)
(as input) the hidden state of the 1st layer -LSTM(64)- that has the size[batch_size, time-step, hidden_unit_length]
and passes it through it's own hidden network - in this case consisting of 32 nodes-? - If the first is true, why the
input_shape
of the 1st -LSTM(64)- and 2nd -LSTM(32)- is the same, when the 2nd only processes the hidden state of the 1st layer? Shouldn't in our case haveinput_shape
set to be[32, 10, 64]
?
我发现下面的LSTM可视化非常有用(在此处 ),但在堆叠式lstm网络上不会扩展:
I found the LSTM visualization below very helpful (found here) but it doesn't expand on stacked-lstm networks:
任何帮助将不胜感激. 谢谢!
Any help would be highly appreciated. Thanks!
推荐答案
input_shape
仅对于第一层是必需的.后续层将上一层的输出作为输入(因此,它们的input_shape
参数值将被忽略)
The input_shape
is only required for the first layer. The subsequent layers take the output of previous layer as its input (as so their input_shape
argument value is ignored)
下面的模型
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32))
代表以下架构
您可以从model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_26 (LSTM) (None, 5, 64) 17152
_________________________________________________________________
lstm_27 (LSTM) (None, 32) 12416
=================================================================
替换行
model.add(LSTM(32))
与
model.add(LSTM(32, input_shape=(1000000, 200000)))
仍会为您提供相同的体系结构(使用model.summary()
进行验证),因为input_shape
被忽略,因为它将上一层的张量输出作为输入.
will still give you the same architecture (verify using model.summary()
) because the input_shape
is ignore as it takes as input the tensor output of the previous layer.
如果您需要一个序列来像下面的序列进行构架
And If you need a sequence to sequence architecture like below
您应该使用以下代码:
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32, return_sequences=True))
应该返回模型
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_32 (LSTM) (None, 5, 64) 17152
_________________________________________________________________
lstm_33 (LSTM) (None, 5, 32) 12416
=================================================================
这篇关于堆叠LSTM网络中每个LSTM层的输入是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!