pytorch 中 num_layers = 2 的 1 个 LSTM 和 2 个 LSTM 之间的区别 [英] Difference between 1 LSTM with num_layers = 2 and 2 LSTMs in pytorch

查看:22
本文介绍了pytorch 中 num_layers = 2 的 1 个 LSTM 和 2 个 LSTM 之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是深度学习的新手,目前正在研究使用 LSTM 进行语言建模.我正在查看 pytorch 文档并被它弄糊涂了.

I am new to deep learning and currently working on using LSTMs for language modeling. I was looking at the pytorch documentation and was confused by it.

如果我创建一个

nn.LSTM(input_size, hidden_size, num_layers) 

其中 hidden_​​size = 4 和 num_layers = 2,我想我会有一个类似的架构:

where hidden_size = 4 and num_layers = 2, I think I will have an architecture something like:

op0    op1 ....
LSTM -> LSTM -> h3
LSTM -> LSTM -> h2
LSTM -> LSTM -> h1
LSTM -> LSTM -> h0
x0     x1 .....

如果我做类似的事情

nn.LSTM(input_size, hidden_size, 1)
nn.LSTM(input_size, hidden_size, 1)

我认为网络架构将与上面完全一样.我错了吗?如果是,这两者有什么区别?

I think the network architecture will look exactly like above. Am I wrong? And if yes, what is the difference between these two?

推荐答案

多层 LSTM 更好地称为堆叠 LSTM,其中多层 LSTM 相互堆叠.

The multi-layer LSTM is better known as stacked LSTM where multiple layers of LSTM are stacked on top of each other.

你的理解是正确的.下面两个stacked LSTM的定义是一样的.

Your understanding is correct. The following two definitions of stacked LSTM are same.

nn.LSTM(input_size, hidden_size, 2)

nn.Sequential(OrderedDict([
    ('LSTM1', nn.LSTM(input_size, hidden_size, 1),
    ('LSTM2', nn.LSTM(hidden_size, hidden_size, 1)
]))

在这里,输入被送入 LSTM 的最低层,然后最低层的输出被转发到下一层,依此类推.请注意,最低 LSTM 层的输出大小和 LSTM 层其余部分的输入大小为 hidden_​​size.

Here, the input is feed into the lowest layer of LSTM and then the output of the lowest layer is forwarded to the next layer and so on so forth. Please note, the output size of the lowest LSTM layer and the rest of the LSTM layer's input size is hidden_size.

但是,您可能已经看到人们通过以下方式定义堆叠 LSTM:

However, you may have seen people defined stacked LSTM in the following way:

rnns = nn.ModuleList()
for i in range(nlayers):
    input_size = input_size if i == 0 else hidden_size
    rnns.append(nn.LSTM(input_size, hidden_size, 1))

人们有时使用上述方法的原因是,如果您使用前两种方法创建堆叠 LSTM,则无法获得每个单独层的隐藏状态.查看 LSTM 在 PyTorch 中返回的内容.

The reason people sometimes use the above approach is that if you create a stacked LSTM using the first two approaches, you can't get the hidden states of each individual layer. Check out what LSTM returns in PyTorch.

因此,如果您想拥有中间层的隐藏状态,则必须将每个单独的 LSTM 层声明为单个 LSTM 并运行循环以模拟多层 LSTM 操作.例如:

So, if you want to have the intermedia layer's hidden states, you have to declare each individual LSTM layer as a single LSTM and run through a loop to mimic the multi-layer LSTM operations. For example:

outputs = []
for i in range(nlayers):
    if i != 0:
        sent_variable = F.dropout(sent_variable, p=0.2, training=True)
    output, hidden = rnns[i](sent_variable)
    outputs.append(output)
    sent_variable = output

最后,outputs 将包含每个单独的 LSTM 层的所有隐藏状态.

In the end, outputs will contain all the hidden states of each individual LSTM layer.

这篇关于pytorch 中 num_layers = 2 的 1 个 LSTM 和 2 个 LSTM 之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆