从PyTorch中的BiLSTM(BiGRU)获取最后一个状态 [英] Taking the last state from BiLSTM (BiGRU) in PyTorch

查看:52
本文介绍了从PyTorch中的BiLSTM(BiGRU)获取最后一个状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在阅读了几篇文章之后,我仍然对从BiLSTM获取最后的隐藏状态的实现的正确性感到困惑.

After reading several articles, I am still quite confused about correctness of my implementation of getting last hidden states from BiLSTM.

  1. 了解PyTorch中的双向RNN(TowardsDataScience)
  2. 用于seq2seq模型的PackedSequence(PyTorch论坛)
  3. "hidden"和"PyTorch LSTM中的输出"?(StackOverflow)
  4. 在一批序列中选择张量(Pytorch形式)

最后一个来源(4)的方法对我来说似乎是最干净的方法,但是我仍然不确定我是否正确理解了线程.我是否在使用LSTM和反向LSTM的正确最终隐藏状态?这是我的实现

The approach from the last source (4) seems to be the cleanest for me, but I am still uncertain if I understood the thread correctly. Am I using the right final hidden states from LSTM and reversed LSTM? This is my implementation

# pos contains indices of words in embedding matrix
# seqlengths contains info about sequence lengths
# so for instance, if batch_size is 2 and pos=[4,6,9,3,1] and 
# seqlengths contains [3,2], we have batch with samples
# of variable length [4,6,9] and [3,1]

all_in_embs = self.in_embeddings(pos)
in_emb_seqs = pack_sequence(torch.split(all_in_embs, seqlengths, dim=0))
output,lasthidden = self.rnn(in_emb_seqs)
if not self.data_processor.use_gru:
    lasthidden = lasthidden[0]
# u_emb_batch has shape batch_size x embedding_dimension
# sum last state from forward and backward  direction
u_emb_batch = lasthidden[-1,:,:] + lasthidden[-2,:,:]

正确吗?

推荐答案

在一般情况下,如果您想创建自己的BiLSTM网络,则需要创建两个常规LSTM,并使用常规输入序列来提供一个,其他具有反向输入序列.完成两个序列的输入后,您只需从两个网络中获取最后的状态,然后以某种方式将它们捆绑在一起(求和或并置).

In a general case if you want to create your own BiLSTM network, you need to create two regular LSTMs, and feed one with the regular input sequence, and the other with inverted input sequence. After you finish feeding both sequences, you just take the last states from both nets and somehow tie them together (sum or concatenate).

据我了解,您正在使用内置的BiLSTM,如此示例(在 nn.LSTM 构造函数中设置 bidirectional = True ).然后,在喂完批次后,您将获得并置的输出,因为PyTorch会为您处理所有麻烦.

As I understand, you are using built-in BiLSTM as in this example (setting bidirectional=True in nn.LSTM constructor). Then you get the concatenated output after feeding the batch, as PyTorch handles all the hassle for you.

如果是这种情况,并且您想对隐藏状态求和,那么您必须

If it is the case, and you want to sum the hidden states, then you have to

u_emb_batch = (lasthidden[0, :, :] + lasthidden[1, :, :])

假设您只有一层.如果您有更多的图层,您的变体会更好.

assuming you have only one layer. If you have more layers, your variant seem better.

这是因为结果是结构化的(请参见文档):

This is because the result is structured (see documentation):

h_n 的形状( num_layers * num_directions,batch,hidden_​​size ):张量包含t = seq_len

h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len

顺便说一句

u_emb_batch_2 = output[-1, :, :HIDDEN_DIM] + output[-1, :, HIDDEN_DIM:]

应该提供相同的结果.

这篇关于从PyTorch中的BiLSTM(BiGRU)获取最后一个状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆