使用数据集时在 OutOfRangeError 之后重置 Tensorflow 图 [英] resetting a Tensorflow graph after OutOfRangeError when using Dataset

查看:25
本文介绍了使用数据集时在 OutOfRangeError 之后重置 Tensorflow 图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 from_generator 数据集 API 的接口,用于将多轮"输入注入到图形中.

I am trying to use the from_generator interface for the Dataset API to inject multiple "rounds" of input into a graph.

在我的第一次尝试中,我使用了repeat() 函数 使生成器连续运行3次.但是,batch_join 调用批量大小不是每轮迭代次数的偶数倍(10 次迭代,批量大小为 3),来自不同轮次"/时代"的数据最终在相同的批处理(取决于张量的处理顺序;图中有一些并行性).

On my first attempt, I used the repeat() function to cause the generator to be run 3 consecutive times. However, the batch_join call with a batch size that is not an even multiple of the number of iterations per round (10 iterations with a batch size of 3), data from different "rounds" / "epochs" end up in the same batch (depending on the order the tensors are processed; there is some parallelism in the graph).

在我的第二次尝试中,我尝试在每个纪元结束后重新运行迭代器完毕.但是,一旦 tf.errors.OutOfRangeError> 被抛出,对ses的所有后续调用.run() 在批处理调用的输出上 再次抛出 OutOfRangeError,即使在 重新运行迭代器的初始化程序.

On my second attempt, I tried to re-run the iterator after each epoch was done. However, as soon as tf.errors.OutOfRangeError is thrown, all subsequent calls to sess.run() on the output of the batch call throw OutOfRangeError again, even after rerunning the iterator's initializer.

我想将多轮输入连续注入到图中,而不是像第一个示例那样重叠(例如,在批处理选项上使用 allow_smaller_final_batch).我在自定义 Tensorflow fork 中实例化的一些内核重启成本非常高,例如mmap处理一个 O(10gb) 的文件,所以我想以某种方式充分利用这两个世界.

I would like to inject multiple rounds of input in succession into a graph and not have them overlap like the first example (e.g. using allow_smaller_final_batch on the batching options). Some of the kernels I instantiate in my custom Tensorflow fork are very expensive to restart, e.g. mmaping a file of O(10gb), so I'd like to somehow get the best of both of these worlds.

推荐答案

我认为问题源于使用 tf.contrib.data.Dataset(支持重新初始化)和 tf.train.batch_join()(使用 TensorFlow 队列和队列运行器,因此不支持重新初始化).

I think the problem stems from using tf.contrib.data.Dataset (which supports reinitialization) with tf.train.batch_join() (which uses TensorFlow queues and queue-runners, and hence does not support reinitialization).

我不完全清楚您的代码在做什么,但我认为您可以将整个管道实现为 Dataset.替换以下代码片段:

I'm not completely clear what your code is doing, but I think you can implement the entire pipeline as a Dataset. Replace the following fragment of code:

my_iterator = MyIterator(iterations=iterations)
dataset = ds.Dataset.from_generator(my_iterator, 
output_types=my_iterator.output_types, 
output_shapes=my_iterator.output_shapes)
#dataset = dataset.repeat(count=repetitions)
iterator = dataset.make_initializable_iterator()
next_elem = iterator.get_next()

#change constant to 1 or 2 or something to see that the batching is more predictable
ripple_adds = [(tf.stack((next_elem[0], next_elem[1] + constant)),) 
for constant in ripple_add_coefficients]
batch = tf.train.batch_join(ripple_adds, batch_size=batch_size, 
enqueue_many=False, name="sink_queue")

...类似以下内容:

my_iterator = MyIterator(iterations=iterations)
dataset = tf.contrib.data.from_generator(my_iterator,
                                         output_types=my_iterator.output_types,
                                         output_shapes=my_iterator.output_shapes)

def ripple_add_map_func(x, y):
  return (tf.contrib.data.Dataset.range(num_ripples)
          .map(lambda r: tf.stack([x, y + r])))

dataset = dataset.flat_map(ripple_add_map_func).batch(batch_size)

iterator = dataset.make_initializable_iterator()
batch = iterator.get_next()

这篇关于使用数据集时在 OutOfRangeError 之后重置 Tensorflow 图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆