从 PyTorch 中的 BiLSTM (BiGRU) 获取最后一个状态 [英] Taking the last state from BiLSTM (BiGRU) in PyTorch

查看:81
本文介绍了从 PyTorch 中的 BiLSTM (BiGRU) 获取最后一个状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在阅读了几篇文章后,我仍然对我从 BiLSTM 获取最后一个隐藏状态的实现的正确性感到困惑.

After reading several articles, I am still quite confused about correctness of my implementation of getting last hidden states from BiLSTM.

  1. 了解 PyTorch 中的双向 RNN (TowardsDataScience)
  2. seq2seq 模型的 PackedSequence(PyTorch 论坛)
  3. 隐藏"和有什么区别"输出"在 PyTorch LSTM 中?(StackOverflow)
  4. 在一批序列中选择张量(Pytorch 形式)

最后一个来源 (4) 的方法对我来说似乎是最干净的,但我仍然不确定我是否正确理解了该线程.我是否使用了来自 LSTM 和反向 LSTM 的正确最终隐藏状态?这是我的实现

The approach from the last source (4) seems to be the cleanest for me, but I am still uncertain if I understood the thread correctly. Am I using the right final hidden states from LSTM and reversed LSTM? This is my implementation

# pos contains indices of words in embedding matrix
# seqlengths contains info about sequence lengths
# so for instance, if batch_size is 2 and pos=[4,6,9,3,1] and 
# seqlengths contains [3,2], we have batch with samples
# of variable length [4,6,9] and [3,1]

all_in_embs = self.in_embeddings(pos)
in_emb_seqs = pack_sequence(torch.split(all_in_embs, seqlengths, dim=0))
output,lasthidden = self.rnn(in_emb_seqs)
if not self.data_processor.use_gru:
    lasthidden = lasthidden[0]
# u_emb_batch has shape batch_size x embedding_dimension
# sum last state from forward and backward  direction
u_emb_batch = lasthidden[-1,:,:] + lasthidden[-2,:,:]

正确吗?

推荐答案

在一般情况下,如果您想创建自己的 BiLSTM 网络,您需要创建两个常规 LSTM,并用常规输入序列馈送其中一个,并且其他具有反向输入序列.在完成两个序列的输入后,您只需从两个网络中获取最后一个状态并以某种方式将它们联系在一起(求和或连接).

In a general case if you want to create your own BiLSTM network, you need to create two regular LSTMs, and feed one with the regular input sequence, and the other with inverted input sequence. After you finish feeding both sequences, you just take the last states from both nets and somehow tie them together (sum or concatenate).

据我所知,您正在使用 这个例子(在nn.LSTM构造函数中设置bidirectional=True).然后您在输入批次后获得连接的输出,因为 PyTorch 会为您处理所有麻烦.

As I understand, you are using built-in BiLSTM as in this example (setting bidirectional=True in nn.LSTM constructor). Then you get the concatenated output after feeding the batch, as PyTorch handles all the hassle for you.

如果是这样,并且您想对隐藏状态求和,那么您必须

If it is the case, and you want to sum the hidden states, then you have to

u_emb_batch = (lasthidden[0, :, :] + lasthidden[1, :, :])

假设你只有一层.如果你有更多层,你的变体看起来更好.

assuming you have only one layer. If you have more layers, your variant seem better.

这是因为结果是结构化的(请参阅文档):

This is because the result is structured (see documentation):

h_n 的形状 (num_layers * num_directions, batch, hidden_​​size):包含 t = seq_len 的隐藏状态的张量

h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len

顺便说一句,

u_emb_batch_2 = output[-1, :, :HIDDEN_DIM] + output[-1, :, HIDDEN_DIM:]

应该提供相同的结果.

这篇关于从 PyTorch 中的 BiLSTM (BiGRU) 获取最后一个状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆