RNN 模型:推断长度超过训练期间使用的最大序列长度的句子 [英] RNN model: Infer on sentences longer than max sequence length used during training

查看:53
本文介绍了RNN 模型:推断长度超过训练期间使用的最大序列长度的句子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在训练 RNN 模型(使用 rnn.dynamic_rnn 方法),我的数据矩阵的形状为 num_examples x max_sequence_length x num_features.在训练期间,我不想将 max_sequence_length 增加到超过 50 或 100,因为这会增加训练时间和内存.我的训练集中的所有句子都少于 50.但是,在测试期间,我希望模型能够推断出多达 500 个标记.是否可以?我该怎么做?

解决方案

@sonal - 是的,这是可能的.因为,在大多数测试中,我们感兴趣的是传递单个示例,而不是一堆数据.所以,你需要的是,你需要传递单个示例的数组,让我们说

test_index = [10 , 23 , 42 ,12 ,24, 50]

到一个 dynamic_rnn .预测必须基于最终隐藏状态发生.在 dynamic_rnn 中,我认为您可以在 training 中传递超出 max_length 的句子.如果不是,您可以编写自定义解码器函数,使用您在训练时获得的权重来计算 GRU 或 LSTM 状态.这个想法是您可以继续生成输出,直到达到测试用例的最大长度或直到模型生成EOS"特殊令牌.我更喜欢使用解码器,在从编码器获得最终隐藏状态后,这也会产生更好的结果.

# 用于while循环的函数,用于提前停止defdecoder_cond(time, state, output_ta_t):返回 tf.less(time, max_sequence_length)# body_builder 只是一个包装器来解析反馈defdecoder_body_builder(反馈=假):# 解码器主体,这就是 RNN 魔法发生的地方!defdecoder_body(time, old_state, output_ta_t):# 当验证我们需要之前的预测时,处理反馈如果反馈:def from_previous():prev_1 = tf.matmul(old_state, W_out) + b_outa_max = tf.argmax(prev_1, 1)#### 尝试找到令牌索引并停止条件,直到获得 EOS 令牌索引.返回 tf.gather(嵌入,a_max)x_t = tf.cond(tf.equal(time, 0), from_previous, lambda: input_ta.read(0))别的:# 否则我们只是读取下一个时间步x_t = input_ta.read(time)# 计算 GRUz = tf.sigmoid(tf.matmul(x_t, W_z_x) + tf.matmul(old_state, W_z_h) + b_z) # 更新门r = tf.sigmoid(tf.matmul(x_t, W_r_x) + tf.matmul(old_state, W_r_h) + b_r) # 重置门c = tf.tanh(tf.matmul(x_t, W_c_x) + tf.matmul(r*old_state, W_c_h) + b_c) # 提议的新状态new_state = (1-z)*c + z*old_state # 新状态# 写入输出output_ta_t = output_ta_t.write(time, new_state)# 以输入到下一步"的方式返回return (time + 1, new_state, output_ta_t)返回decoder_body# 设置要循环的变量output_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, infer_shape=False)时间 = tf.constant(0)loop_vars = [时间,initial_state,output_ta]# 运行while循环进行训练_, 状态, output_ta = tf.while_loop(decoder_cond,decoder_body_builder(反馈 = 真),循环变量,交换内存=交换)

这只是一个片段代码,请尝试相应地修改它.更多细节可以在 https://github.com/alrojo/tensorflow-tutorial

I am training a RNN model (using rnn.dynamic_rnn method) and my data matrix is of the shape num_examples x max_sequence_length x num_features. During training, I do not want to increase max_sequence_length to more than 50 or 100, since it increases the training time and memory. All the sentences in my training set was less than 50. However, during testing, I want the model to be able to infer on up to 500 tokens. Is it possible? How do I do it?

解决方案

@sonal - Yes it is possible . Because , in testing most of the time , we are interested in passing a single example , rather than a bunch of data . So , what you need is , you need to pass the array of single example lets say

test_index = [10 , 23 , 42 ,12 ,24, 50] 

to a dynamic_rnn . The prediction has to happen based on the final hidden state . Inside dynamic_rnn , I think you can pass sentences beyond max_length in training . If it is not , you can write a custom decoder function , to calculate the GRU or LSTM states , with the weights you obtained while training . The idea is you can continue generate the output until you reach a maximum length for a testing case or until the model generated a 'EOS' special token . I prefer , you make use of a decoder , after you get the final hidden state from the encoder , this will give better results also .

# function to the while-loop, for early stopping
    def decoder_cond(time, state, output_ta_t):
        return tf.less(time, max_sequence_length)

    # the body_builder is just a wrapper to parse feedback
    def decoder_body_builder(feedback=False):
        # the decoder body, this is where the RNN magic happens!
        def decoder_body(time, old_state, output_ta_t):
            # when validating we need previous prediction, handle in feedback
            if feedback:
                def from_previous():
                    prev_1 = tf.matmul(old_state, W_out) + b_out
                    a_max = tf.argmax(prev_1, 1)
                    #### Try to find the token index and stop the condition until you get a EOS token index . 
                    return tf.gather(embeddings, a_max )
                x_t = tf.cond(tf.equal(time, 0), from_previous, lambda: input_ta.read(0))
            else:
                # else we just read the next timestep
                x_t = input_ta.read(time)

            # calculate the GRU
            z = tf.sigmoid(tf.matmul(x_t, W_z_x) + tf.matmul(old_state, W_z_h) + b_z) # update gate
            r = tf.sigmoid(tf.matmul(x_t, W_r_x) + tf.matmul(old_state, W_r_h) + b_r) # reset gate
            c = tf.tanh(tf.matmul(x_t, W_c_x) + tf.matmul(r*old_state, W_c_h) + b_c) # proposed new state
            new_state = (1-z)*c + z*old_state # new state

            # writing output
            output_ta_t = output_ta_t.write(time, new_state)

            # return in "input-to-next-step" style
            return (time + 1, new_state, output_ta_t)
        return decoder_body
    # set up variables to loop with
    output_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, infer_shape=False)
    time = tf.constant(0)
    loop_vars = [time, initial_state, output_ta]

    # run the while-loop for training
    _, state, output_ta = tf.while_loop(decoder_cond,
                                        decoder_body_builder(feedback = True),
                                        loop_vars,
                                        swap_memory=swap)

This is just a snippet code , try to modify it accordingly . More details can found in https://github.com/alrojo/tensorflow-tutorial

这篇关于RNN 模型:推断长度超过训练期间使用的最大序列长度的句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆