如何在Tensorflow中将一个LSTM的输出与文本一起输入到另一个LSTM中? [英] How to Feed in the output of one LSTM along with Text into another LSTM in Tensorflow?

查看:77
本文介绍了如何在Tensorflow中将一个LSTM的输出与文本一起输入到另一个LSTM中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将一个LSTM层的输出以及该层包含的文本输入到另一个LSTM层.提供给这两个LSTM的文本是不同的,我的目标是第二个LSTM根据第一个LSTM的理解来提高对文本的理解.

I am trying to feed in the output of one LSTM layer into another LSTM layer, along with the text included for that layer. The text provided to the two LSTM's is different, and my goal is that the second LSTM improves its understanding of it's text based on what the first LSTM understood.

我可以尝试在Tensorflow中实现它,如下所示:

I can try to implement it in Tensorflow like this:

# text inputs to the two LSTM's
rnn_inputs = tf.nn.embedding_lookup(embeddings, text_data)
rnn_inputs_2 = tf.nn.embedding_lookup(embeddings, text_data)
# first LSTM
lstm1Output, lstm1State = tf.nn.dynamic_rnn(cell=lstm1, 
        inputs=rnn_inputs, 
        sequence_length=input_lengths, 
        dtype=tf.float32, 
        time_major=False)
# second LSTM
lstm2Output, lstm2State = tf.nn.dynamic_rnn(cell=lstm2, 
        # use the input of the second LSTM and the first LSTM here
        inputs=rnn_inputs_2 + lstm1State, 
        sequence_length=input_lengths_2, 
        dtype=tf.float32, 
        time_major=False)

这是有问题的,因为 rnn_inputs_2 的大小为(批量大小,_,hidden_​​layer_size),而 lstm1State 的大小为(批量大小,隐藏层大小).有没有人知道如何更改形状以完成这项工作,或者是否有更好的方法?

This has an issue, since rnn_inputs_2 size is of (batch_size, _, hidden_layer_size), while lstm1State size is of (batch_size, hidden_layer_size). Does anyone have an idea of how I can change the shapes to make this work, or if there is some better way?

谢谢

推荐答案

您正在将LSTM1的隐藏状态解释为句子嵌入(正确地如此).现在,您希望将该语句嵌入到LSTM2中,作为它可以基于其决策的先验知识.

You're interpreting the hidden state of LSTM1 as a sentence embedding (rightfully so). And you now want to pass that sentence embedding into LSTM2 as prior knowledge it can base its decisions on.

如果我正确地描述了这一点,那么您似乎正在描述一种编码器/解码器模型,并向LSTM2添加了新的输入.如果这是正确的,那么我的第一种方法是将LSTM1的隐藏状态作为LSTM2的初始状态传递.这比将其添加到每个LSTM2时间步长的输入中更加合乎逻辑.

If I described that correctly then you seem to be describing an encoder/decoder model, with the addition of new inputs to LSTM2. If that's accurate, then my first approach would be to pass the hidden state of LSTM1 in as the initial state of LSTM2. That would be far more logical than adding it to the input of each LSTM2 time step.

从LSTM2到LSTM1的状态再回到LSTM1的额外梯度路径将为您带来更多的好处,因此您将不仅在LSTM1的损失函数上而且还在其提供某些东西的能力方面训练LSTM1.LSTM2可以用来改善其损失函数的功能(假设您在同一sess.run迭代中训练LSTM 1和amp 2).

You would have the further benefit of having an extra gradient path passing from LSTM2 through the state of LSTM1 back to LSTM1, so you would be training LSTM1 on not only the loss function for LSTM1, but also on its ability to provide something that LSTM2 can use to improve its loss function (assuming you train both LSTM 1&2 in the same sess.run iteration).

关于这个问题:

另一个问题,如果我想介绍谁输出的LSTM3,该怎么办?还应该影响LSTM2.在这种情况下,我将LSTM3和LSTM1隐藏状态并将其设置为LSTM2的初始状态?

Another question, what if I wanted to introduce a LSTM3 who's output should also effect LSTM2. In this case, would I just sum LSTM3 and LSTM1 hidden state and set that as the initial state for LSTM2?

总结听起来很糟糕,串联听起来不错.您可以控制LSTM2的隐藏状态大小,它应该具有更大的隐藏状态大小.

Summing sounds bad, concatenating sounds good. You control the hidden state size of LSTM2, it should just have a larger hidden state size.

关于这个问题:

我没有提到的一件事早期是有时LSTM1不会收到任何输入,并且显然由于它的输入是一个句子,因此LSTM1将收到不同的输入每次.这会影响LSTM1和LSTM2的错误更新吗?而且,这意味着我不能使用编码器-解码器系统,正确的?否则您所说的是有道理的,我现在就运行它并查看它是否对我的表现有帮助

One of the things I didn't mention earlier was that sometimes LSTM1 will receive no input, and obviously since it's input is a sentence, LSTM1 will receive different input every time. Would this impact the error updates for LSTM1 and LSTM2? Also, this would mean that I can't use an encoder-decoder system, right? Otherwise what you are saying makes sense, I am running it now and will see if it helps my performance

在这种情况下,如果LSTM1没有输入(因此也没有输出状态),我认为逻辑解决方案是使用全零的标准隐藏状态向量初始化LSTM2.如果您不给dynamic_rnn一个初始的隐藏状态,这就是在后台执行的操作,因此,如果您明确为其传递一个0的向量,则等效.

In this case, if LSTM1 has no input (and thus no output state), I think the logical solution is to initialize LSTM2 with a standard hidden state vector of all zeros. This is what dynamic_rnn is doing under the hood if you don't give it an initial hidden state, so it's equivalent if you explicitly pass it a vector of 0's.

这篇关于如何在Tensorflow中将一个LSTM的输出与文本一起输入到另一个LSTM中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆