每个 Minibatch 的 TensorFlow Seq2Seq 训练时间单调增加 [英] TensorFlow Seq2Seq Training Time per Minibatch Monotonically Increases

查看:27
本文介绍了每个 Minibatch 的 TensorFlow Seq2Seq 训练时间单调增加的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在训练一个 tensorflow.contrib.seq2seq 编码器-解码器模型,每个 minibatch 的训练时间单调增加.

I'm training a tensorflow.contrib.seq2seq encoder-decoder model, and the training time per minibatch is monotonically increasing.

<代码>步数:10 经过时间:52.89215302467346 损失:1.0420862436294556 指标:{'accuracy':0.22499999}步数:20 经过时间:60.28505992889404 损失:0.8007364869117737 指标:{'accuracy':0.28}步数:30 经过时间:73.98479580879211 损失:0.7292348742485046 指标:{'accuracy':0.34}步数:40 经过时间:82.99069213867188 损失:0.6843382120132446 指标:{'accuracy':0.345}步数:50 经过时间:86.97363901138306 损失:0.6808319687843323 指标:{'accuracy':0.38999999}步骤数:60 经过时间:106.96697807312012 损失:0.601255476474762 指标:{'accuracy':0.44}步数:70 经过时间:124.17725801467896 损失:0.5971778035163879 指标:{'accuracy':0.405}步数:80 经过时间:137.91252613067627 损失:0.596596896648407 指标:{'accuracy':0.43000001}步数:90 经过时间:146.6834409236908 损失:0.5921837687492371 指标:{'accuracy':0.42500001}

我的所有数据都是人工生成并随机采样的,这意味着(通常)训练初期的小批量和训练后期的小批量之间应该没有区别.此外,我的所有数据都具有相同的输入序列长度和相同的输出序列长度.为什么我的模型可能需要更长的时间来训练以后的小批量?

All my data are artificially generated and are sampled randomly, meaning that (in general) there should be no difference between minibatches early in training and minibatches later in training. Additionally, all my data have the same input sequence length and the same output sequence length. Why might my model take longer to train later minibatches?

我发现了这个相关的帖子,但我不会在训练循环期间更改我的计算图.

I found this relevant post, but I'm not changing my computational graph during my training loop.

为了显示一些代码,让我们从 main 开始:

To show some code, let's start in main:

def main(_):
    x_minibatch, y_minibatch, y_lengths_minibatch = construct_data_pipeline()

    model = import_model()

    train(model=model, x_minibatch=x_minibatch, y_minibatch=y_minibatch, y_lengths_minibatch=y_lengths_minibatch)

```

我的数据存储为 SequenceExamples,每个 TFRecord 文件一个.我的 construct_data_pipeline() 函数定义如下:

My data is stored as SequenceExamples, one per TFRecord file. My construct_data_pipeline() function is defined as follows:

def construct_data_pipeline():
    # extract TFRecord filenames located in data directory
    tfrecord_filenames = []
    for dirpath, dirnames, filenames in os.walk(tf.app.flags.FLAGS.data_dir):
        for filename in filenames:
            if filename.endswith('.tfrecord'):
                tfrecord_filenames.append(os.path.join(dirpath, filename))

    # read and parse data from TFRecords into tensors
    x, y, x_len, y_len = construct_examples_queue(tfrecord_filenames)

    # group tensors into minibatches
    x_minibatch, y_minibatch, y_lengths_minibatch = construct_minibatches(x=x, y=y,
                                                                      y_len=y_len,
                                                                      x_len=x_len)

    return x_minibatch, y_minibatch, y_lengths_minibatch

进入construct_examples_queue()

def construct_examples_queue(tfrecords_filenames):
    number_of_readers = tf.flags.FLAGS.number_of_readers

    with tf.name_scope('examples_queue'):
        key, example_serialized = tf.contrib.slim.parallel_reader.parallel_read(tfrecords_filenames,
                                                                            tf.TFRecordReader,
                                                                            num_readers=number_of_readers)

        x, y, x_len, y_len = parse_example(example_serialized)

        return x, y, x_len, y_len

我不认为我可以显示 parse_example,因为数据不是我自己的.主要部分是我指定我期望 SequenceExample 包含的内容,然后调用

I don't think I can show parse_example, since the data isn't my own. The main parts are that I specify what I expect the SequenceExample to contain and then call

    context_parsed, sequence_parsed = tf.parse_single_sequence_example(example_serialized,
                                                                   context_features=context_features,
                                                                   sequence_features=sequence_features)

跳到我如何构建小批量,我使用

Skipping ahead to how I construct minibatches, I use

def construct_minibatches(x, y, y_len, x_len,
                      bucket_boundaries=list(range(400, tf.app.flags.FLAGS.max_x_len, 100))):

    batch_size = tf.app.flags.FLAGS.batch_size

    with tf.name_scope('batch_examples_using_buckets'):
        _, outputs = tf.contrib.training.bucket_by_sequence_length(input_length=len_x,
                                                               tensors=[x, y, y_len],
                                                               batch_size=batch_size,
                                                               bucket_boundaries=bucket_boundaries,
                                                               dynamic_pad=True,
                                                               capacity=2 * batch_size,
                                                               allow_smaller_final_batch=True)

        x_minibatch = outputs[0]
        y_minibatch = outputs[1]
        y_lengths_minibatch = outputs[2]
        return x_minibatch, y_minibatch, y_lengths_minibatch

注意:由于隐私问题,我不得不更改一些变量名称.希望我没有犯任何错误.

Note: I had to change some variable names for privacy issues. Hopefully I didn't make any mistakes.

推荐答案

感谢 faddy-w 同时解决了我的两个问题!

Credit to faddy-w for solving two of my problems simultaneously!

事实证明我在不知不觉中改变了我的计算图.

It turns out I was changing my computational graph without knowing it.

我在打电话

sess.run([model.optimizer.minimize(model.loss), model.y_predicted_logits],
                                 feed_dict={model.x: x_values,
                                            model.y_actual: y_values,
                                            model.y_actual_lengths: y_lengths_values})

来自一个循环,其中

model.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=self.y_actual,
                                                                      logits=self.y_predicted_logits))

model.optimizer = tf.train.GradientDescentOptimizer(learning_rate=initial_learning_rate)

不知道 optimizer.minimize() 向我的图表添加了额外的操作.

without knowing that optimizer.minimize() adds additional operations to my graph.

这篇关于每个 Minibatch 的 TensorFlow Seq2Seq 训练时间单调增加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆