张量流数据集顺序未定义? [英] tensorflow Dataset order undefined?

查看:24
本文介绍了张量流数据集顺序未定义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我使用 tf.data.Dataset 数据集中的多个元素来构建图形,然后稍后评估图形,则数据集中元素的顺序似乎未定义.例如,以下代码片段

If I use multiple elements from a tf.data.Dataset dataset to build the graph, and then evaluate the graph later, it seems the order the element from the Dataset is undefined. As an example, the following code snippet

import tensorflow as tf

dataset = tf.data.Dataset.range(5)
iterator = dataset.make_one_shot_iterator()


print 'build graph and then eval'
keep = []
for i in range(5):
  keep.append(iterator.get_next())

with tf.Session() as sess:
  keep_eval = sess.run(keep)
  print keep_eval


print 'eval each element'
with tf.Session() as sess:
  for i in range(5):
    print sess.run(iterator.get_next()), 

将导致输出如下:

构建图然后评估

[3 0 1 4 2]

[3 0 1 4 2]

评估每个元素

0 1 2 3 4

此外,每次运行都会产生不同的构建图然后评估".我希望构建图形然后评估"也可以像评估每个元素"一样排序.谁能解释为什么会发生这种情况?

Also, each run will yield different "build graph and then eval". I would expect "build graph and then eval" to be ordered as well like "eval each element". Can anyone explain why this happens?

推荐答案

tf.data.Dataset 已定义且确定性的顺序(除非您添加非-确定性Dataset.shuffle()).

The order of a tf.data.Dataset is defined and deterministic (unless you add a non-deterministic Dataset.shuffle()).

但是,您的两个循环构建了不同的图,这说明了差异:

However, your two loops build different graphs, which accounts for the difference:

  • 构建图然后评估"部分创建了一个包含五个 iterator.get_next() 操作的列表,并并行运行这五个操作.由于这些操作并行运行,它们可能会以不同的顺序产生结果.

  • The "build graph and then eval" part creates a list of five iterator.get_next() operations and runs the five operations in parallel. Because these operations run in parallel, they may produce results in different order.

评估每个元素"部分还创建了五个 iterator.get_next() 操作,但它会按顺序运行它们,因此您始终可以按预期顺序获得结果.

The "eval each element" part also creates five iterator.get_next() operations, but it runs them sequentially, so you always get the results in the expected order.

请注意,我们不建议在循环中调用 iterator.get_next(),因为它会在每次调用时创建一个新操作,并将其添加到图形中并消耗内存.相反,当您遍历 Dataset 时,请尝试使用以下模式:

Note that we do not recommend calling iterator.get_next() in a loop, because it creates a new operation on each call, which gets added to the graph, and consumes memory. Instead, when you loop over a Dataset, try to use the following pattern:

dataset = tf.data.Dataset.range(5)
iterator = dataset.make_one_shot_iterator()

# Call `iterator.get_next()` once and use the result in each iteration.
next_element = iterator.get_next()

with tf.Session() as sess:
  for i in range(5):
    print sess.run(next_element) 

这篇关于张量流数据集顺序未定义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆