如何将LSTM的先前输出和隐藏状态用于注意力机制? [英] How to use previous output and hidden states from LSTM for the attention mechanism?

查看:987
本文介绍了如何将LSTM的先前输出和隐藏状态用于注意力机制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试从本文中编写注意力机制的代码:基于注意力的神经的有效方法机器翻译",Luong,Pham,Manning(2015). (我将全球注意力与点得分结合使用).

I am currently trying to code the attention mechanism from this paper: "Effective Approaches to Attention-based Neural Machine Translation", Luong, Pham, Manning (2015). (I use global attention with the dot score).

但是,我不确定如何从lstm解码中输入隐藏状态和输出状态.问题在于lstm解码器在时间t的输入取决于我需要使用t-1的输出和隐藏状态来计算的数量.

However, I am unsure on how to input the hidden and output states from the lstm decode. The issue is that the input of the lstm decoder at time t depends on quantities that I need to compute using the output and hidden states from t-1.

这是代码的相关部分:

with tf.variable_scope('data'):
    prob = tf.placeholder_with_default(1.0, shape=())
    X_or = tf.placeholder(shape = [batch_size, timesteps_1, num_input], dtype = tf.float32, name = "input")
    X = tf.unstack(X_or, timesteps_1, 1)
    y = tf.placeholder(shape = [window_size,1], dtype = tf.float32, name = "label_annotation")
    logits = tf.zeros((1,1), tf.float32)

with tf.variable_scope('lstm_cell_encoder'):
    rnn_layers = [tf.nn.rnn_cell.LSTMCell(size) for size in [hidden_size, hidden_size]]
    multi_rnn_cell = tf.nn.rnn_cell.MultiRNNCell(rnn_layers)
    lstm_outputs, lstm_state =  tf.contrib.rnn.static_rnn(cell=multi_rnn_cell,inputs=X,dtype=tf.float32)
    concat_lstm_outputs = tf.stack(tf.squeeze(lstm_outputs))
    last_encoder_state = lstm_state[-1]

with tf.variable_scope('lstm_cell_decoder'):

    initial_input = tf.unstack(tf.zeros(shape=(1,1,hidden_size2)))
    rnn_decoder_cell = tf.nn.rnn_cell.LSTMCell(hidden_size, state_is_tuple = True)
    # Compute the hidden and output of h_1

    for index in range(window_size):

        output_decoder, state_decoder = tf.nn.static_rnn(rnn_decoder_cell, initial_input, initial_state=last_encoder_state, dtype=tf.float32)

        # Compute the score for source output vector
        scores = tf.matmul(concat_lstm_outputs, tf.reshape(output_decoder[-1],(hidden_size,1)))
        attention_coef = tf.nn.softmax(scores)
        context_vector = tf.reduce_sum(tf.multiply(concat_lstm_outputs, tf.reshape(attention_coef, (window_size, 1))),0)
        context_vector = tf.reshape(context_vector, (1,hidden_size))

        # compute the tilda hidden state \tilde{h}_t=tanh(W[c_t, h_t]+b_t)
        concat_context = tf.concat([context_vector, output_decoder[-1]], axis = 1)
        W_tilde = tf.Variable(tf.random_normal(shape = [hidden_size*2, hidden_size2], stddev = 0.1), name = "weights_tilde", trainable = True)
        b_tilde = tf.Variable(tf.zeros([1, hidden_size2]), name="bias_tilde", trainable = True)
        hidden_tilde = tf.nn.tanh(tf.matmul(concat_context, W_tilde)+b_tilde) # hidden_tilde is [1*64]

        # update for next time step
        initial_input = tf.unstack(tf.reshape(hidden_tilde, (1,1,hidden_size2)))
        last_encoder_state = state_decoder

        # predict the target

        W_target = tf.Variable(tf.random_normal(shape = [hidden_size2, 1], stddev = 0.1), name = "weights_target", trainable = True)
        logit = tf.matmul(hidden_tilde, W_target)
        logits = tf.concat([logits, logit], axis = 0)

    logits = logits[1:]

循环中的部分是我不确定的部分.当我覆盖变量"initial_input"和"last_encoder_state"时,tensorflow是否还记得计算图?

The part inside the loop is what I am unsure of. Does tensorflow remember the computational graph when I overwrite the variable "initial_input" and "last_encoder_state"?

推荐答案

我认为,如果您使用

I think your model will be much simplified if you use tf.contrib.seq2seq.AttentionWrapper with one of implementations: BahdanauAttention or LuongAttention.

通过这种方式,可以将注意力向量连接到单元格级别,以便在施加注意后,单元格输出已经 . seq2seq教程:

This way it'll be possible to wire the attention vector on a cell level, so that cell output is already after attention applied. Example from the seq2seq tutorial:

cell = LSTMCell(512)
attention_mechanism = tf.contrib.seq2seq.LuongAttention(512, encoder_outputs)
attn_cell = tf.contrib.seq2seq.AttentionWrapper(cell, attention_mechanism, attention_size=256)

请注意,通过这种方式,您将不需要window_size循环,因为tf.nn.static_rnntf.nn.dynamic_rnn将实例化被注意包裹的单元格.

Note that this way you won't need a loop of window_size, because tf.nn.static_rnn or tf.nn.dynamic_rnn will instantiate the cells wrapped with attention.

关于您的问题:您应该区分python变量和tensorflow图节点:您可以将last_encoder_state分配给其他张量,因此,原始图节点不会更改.这是灵活的,但在结果网络中也会产生误导-您可能会认为将LSTM连接到一个张量,而实际上是另一个张量.通常,您不应该这样做.

Regarding your question: you should distinguish python variables and tensorflow graph nodes: you can assign last_encoder_state to a different tensor, the original graph node won't change because of this. This is flexible, but can be also misleading in the result network - you might think that you connect an LSTM to one tensor, but it's actually the other. In general, you shouldn't do that.

这篇关于如何将LSTM的先前输出和隐藏状态用于注意力机制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆