在TensorFlow dynamic_rnn中使用sequence_length参数时如何处理填充 [英] How to handle padding when using sequence_length parameter in TensorFlow dynamic_rnn

查看:646
本文介绍了在TensorFlow dynamic_rnn中使用sequence_length参数时如何处理填充的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Tensorflow中的dynamic_rnn功能来加快训练速度.阅读一些内容后,我的理解是加快训练速度的一种方法是在此函数中将值显式传递给sequence_length参数.经过一番阅读之后,找到这种 SO解释,看来我需要什么通过的是一个向量(可能由tf.placeholder定义),其中包含批处理中每个序列的长度.

I'm trying to use the dynamic_rnn function in Tensorflow to speed up training. After doing some reading, my understanding is that one way to speed up training is to explicitly pass a value to the sequence_length parameter in this function. After a bit more reading, and finding this SO explanation, it seems like what I need to pass is a vector (maybe defined by a tf.placeholder) that contains the length of each sequence within a batch.

这是我很困惑的地方:为了利用这一点,我应该将每个批次填充到批次中最长长度的序列,而不是训练集中的最长序列吗? Tensorflow如何处理任何较短序列中的其余零/填充令牌?另外,这里的主要优势是真的可以提高速度,还是可以额外保证我们在训练过程中掩盖了打击垫令牌?任何帮助/上下文将不胜感激.

Here's where I'm confused: in order to take advantage of this, should I pad each of my batches to the longest-length sequence within the batch instead of the longest-length sequence in the training set? How does Tensorflow handle the remaining zeros/pad-tokens in any of the shorter sequences? Also, is the main advantage here really speed, or just extra assurance that we're masking pad-tokens during training? Any help/context would be appreciated.

推荐答案

我应该将每个批次填充到批次中最长的序列,而不是训练集中的最长序列吗?

should I pad each of my batches to the longest-length sequence within the batch instead of the longest-length sequence in the training set?

批次中的序列 必须对齐,即必须具有相同的长度.因此,对您的问题的一般回答是是".但是不同批次的长度不必相同,因此您可以将输入序列分层为大致相同大小的组,并相应地填充它们.这项技术称为 bucketing ,您可以在本教程.

The sequences within a batch must be aligned, i.e., have to have the same length. So the general answer to your question is "yes". But different batches doesn't have to be of the same length, so you can stratify input sequences into groups that have roughly the same size and pad them accordingly. This technique is called bucketing and you can read about it in this tutorial.

Tensorflow如何处理任何较短序列中的其余零/填充令牌?

How does Tensorflow handle the remaining zeros/pad-tokens in any of the shorter sequences?

非常直观. tf.nn.dynamic_rnn 返回两个张量:outputstates.假设实际序列长度为t,填充序列长度为T.

Pretty much intuitive. tf.nn.dynamic_rnn returns two tensors: output and states. Suppose the actual sequence length is t and the padded sequence length is T.

然后output将在i > t之后包含零,而states将包含第t个单元格状态,而忽略尾随单元格的状态.

Then the output will contain zeros after i > t and states will contain the t-th cell state, ignoring the states of trailing cells.

这是一个例子:

import numpy as np
import tensorflow as tf

n_steps = 2
n_inputs = 3
n_neurons = 5

X = tf.placeholder(dtype=tf.float32, shape=[None, n_steps, n_inputs])
seq_length = tf.placeholder(tf.int32, [None])

basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, 
                                    sequence_length=seq_length, dtype=tf.float32)

X_batch = np.array([
  # t = 0      t = 1
  [[0, 1, 2], [9, 8, 7]], # instance 0
  [[3, 4, 5], [0, 0, 0]], # instance 1
  [[6, 7, 8], [6, 5, 4]], # instance 2
])
seq_length_batch = np.array([2, 1, 2])

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  outputs_val, states_val = sess.run([outputs, states], feed_dict={
    X: X_batch, 
    seq_length: seq_length_batch
  })
  print(outputs_val)
  print()
  print(states_val)

请注意,实例1已填充,因此outputs_val[1,1]是零向量,而states_val[1] == outputs_val[1,0]:

Note that instance 1 is padded, so outputs_val[1,1] is a zero vector and states_val[1] == outputs_val[1,0]:

[[[ 0.76686853  0.8707901  -0.79509073  0.7430128   0.63775384]
  [ 1.          0.7427926  -0.9452815  -0.93113345 -0.94975543]]

 [[ 0.9998851   0.98436266 -0.9620067   0.61259484  0.43135557]
  [ 0.          0.          0.          0.          0.        ]]

 [[ 0.99999994  0.9982034  -0.9934515   0.43735617  0.1671598 ]
  [ 0.99999785 -0.5612586  -0.57177305 -0.9255771  -0.83750355]]]

[[ 1.          0.7427926  -0.9452815  -0.93113345 -0.94975543]
 [ 0.9998851   0.98436266 -0.9620067   0.61259484  0.43135557]
 [ 0.99999785 -0.5612586  -0.57177305 -0.9255771  -0.83750355]]

还有,这里的主要优势是真的速度吗,还是额外的保证,就是我们在训练过程中掩盖了垫子令牌?

Also, is the main advantage here really speed, or just extra assurance that we're masking pad-tokens during training?

当然,批处理比逐一添加序列更有效.但是指定长度的主要优点是您可以从RNN中获得合理的状态,即,填充项不会影响结果张量.如果不设置长度,但是手动选择正确的状态,将会得到完全相同的结果(和相同的速度).

Of course, batch processing is more efficient, than feeding the sequences one by one. But the main advantage of specifying the length is that you get the reasonable state out of RNN, i.e., padded items don't affect the result tensor. You will get exactly the same result (and the same speed) if you don't set the length, but select the right states manually.

这篇关于在TensorFlow dynamic_rnn中使用sequence_length参数时如何处理填充的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆