seq2seq-Attention 窥视编码器状态绕过最后一个编码器隐藏状态 [英] seq2seq-Attention peeping into the encoder-states bypasses last encoder-hidden-state

查看:66
本文介绍了seq2seq-Attention 窥视编码器状态绕过最后一个编码器隐藏状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 seq2seq 模型中,我想使用编码结束时的隐藏状态从输入序列中读出更多信息.

In the seq2seq-Model I want to use the hidden state at end of encoding to read out further info from the input sequence.

所以我返回隐藏状态并在其上构建一个新的子网.这很好用.但是,我有一个疑问:这应该会变得更加复杂,因此我实际上依赖于将附加任务的所有必要信息编码在隐藏状态中.

So I return the hidden state and build a new sub net on top of it. That works decently well. However, I have a doubt: This is supposed to become more complex, thus I am effectively relying of having ALL the necessary information for the additional task to be encoded in that hidden state.

然而,如果 seq2seq-decoder 使用注意力机制,它基本上可以窥视编码器端,有效地绕过编码结束时的隐藏状态.因此,并非 seq2seq 网络所依赖的所有信息都在编码结束时以隐藏状态编码.

If, however, the seq2seq-decoder uses the attention mechanism, it basically peeps into the encoder side, effectively bypassing the hidden state at end of encoding. Thus NOT ALL the info the seq2seq-network relies on is encoded in the hidden state at end of encoding.

从理论上讲,这是否意味着我不必使用注意力机制而是使用普通的 seq2seq 以便在编码结束时最大限度地摆脱隐藏状态?这显然会牺牲 seq2seq 任务的很大一部分效率.

Does that, in theory, mean that I have to not use the attention mechanism but go with plain-vanilla-seq2seq in order to get the maximum out of the hidden state at end of encoding? This would obviously sacrifice a big part of the effectiveness on the seq2seq-task.

只是想确认我有疑问.基本上:通常 seq2seq 模型中的最后一个编码器隐藏状态将包含所有相关的解码信息.但是注意了就不是这样了,对吧?

Just trying to get a doubt confirmed I am having. Basically: Normally the last encoder-hidden-state in the seq2seq-model would contain ALL relevant info for decoding. But with attention this is no longer the case, right?

从一个更具推测性的角度来说,您是否同意这些可能的解决方案:- 为新的子网创建额外的注意力机制?- 或者,或者,在编码器端的所有隐藏状态上使用卷积作为新子网的附加输入?

And on a more speculative note, do you agree with these possible solutions: - Create an additional attention mechanism for the new sub net? - Or, alternatively, use a convolution over all the hidden states of the encoder-side as additional input to the new sub net?

有什么想法吗?更容易修复?

Any thoughts? Easier fixes?

谢谢

推荐答案

最重要的是,您应该尝试不同的方法,看看哪种模型最适合您的数据.在不了解您的数据或运行一些测试的情况下,无法推测注意力机制、CNN 等是否提供任何好处.

Bottom line, you should try different approaches and see what model works best for your data. Without knowing anything about your data or running some tests it is impossible to speculate on whether attention mechanism, CNN, etc. provides any benefits or not.

但是,如果您使用 tensorflow/tensorflow/python/ops/seq2seq.py 让我分享一些关于 embedding_attention_seq2seq()attention_decoder() 中实现的注意力机制的观察与您的问题相关的:

However, if you are using the tensorflow seq2seq models available in tensorflow/tensorflow/python/ops/seq2seq.py let me share some observations about the attention mechanism as implemented in embedding_attention_seq2seq() and attention_decoder() that related to your question(s):

  1. 解码器的隐藏状态用编码器的最终状态初始化......所以注意不会有效地绕过编码结束时的隐藏状态"恕我直言

embedding_attention_seq2seq() 中的以下代码将最后一个时间步 encoder_state 作为第二个参数中的 initial_state 传入:

The following code in embedding_attention_seq2seq() passes in the last time step encoder_state as the initial_state in the 2nd argument:

  return embedding_attention_decoder(
      decoder_inputs, encoder_state, attention_states, cell,
      num_decoder_symbols, embedding_size, num_heads=num_heads,
      output_size=output_size, output_projection=output_projection,
      feed_previous=feed_previous,
      initial_state_attention=initial_state_attention)

并且你可以看到 initial_state 直接在 attention_decoder() 中使用,没有经过任何注意力状态:

And you can see that initial_state is used directly in attention_decoder() without going through any kind of attention states:

state = initial_state

...

for i, inp in enumerate(decoder_inputs):
  if i > 0:
    variable_scope.get_variable_scope().reuse_variables()
  # If loop_function is set, we use it instead of decoder_inputs.
  if loop_function is not None and prev is not None:
    with variable_scope.variable_scope("loop_function", reuse=True):
      inp = loop_function(prev, i)
  # Merge input and previous attentions into one vector of the right size.
  input_size = inp.get_shape().with_rank(2)[1]
  if input_size.value is None:
    raise ValueError("Could not infer input size from input: %s" % inp.name)
  x = linear([inp] + attns, input_size, True)
  # Run the RNN.
  cell_output, state = cell(x, state)
  ....

  1. 注意力状态通过学习的线性组合与解码器输入相结合

  1. Attention states are combined with decoder inputs via learned linear combinations

x = linear([inp] + attns, input_size, True)

# 运行 RNN.

cell_output, state = cell(x, state)

...linear() 执行 W、b 矩阵运算以将组合输入 + attn 降级到解码器 input_size 中.该模型将学习 W 和 b 的值.

...the linear() does the W, b matrix operations to down rank the combined input + attn into the decoder input_size. The model will learn values for W and b.

总结:注意力状态与解码器的输入相结合,但编码器的最后一个隐藏状态作为解码器的初始隐藏状态输入.

Summary: the attention states are combined with inputs into the decoder, but the last hidden state of the encoder is fed in as the initial hidden state of the decoder without attention.

最后,注意力机制仍然可以使用最后的编码状态,并且只有在学习到这是训练期间最好的做法时才会绕过"它.

Finally, the attention mechanism still has the last encoding state at it's disposal and would only "bypass" it if learned that was the best thing to do during training.

这篇关于seq2seq-Attention 窥视编码器状态绕过最后一个编码器隐藏状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆