张量流数据集顺序未定义? [英] tensorflow Dataset order undefined?
问题描述
如果我使用 tf.data.Dataset 数据集中的多个元素来构建图形,然后稍后评估图形,则数据集中元素的顺序似乎未定义.例如,以下代码片段
If I use multiple elements from a tf.data.Dataset dataset to build the graph, and then evaluate the graph later, it seems the order the element from the Dataset is undefined. As an example, the following code snippet
import tensorflow as tf
dataset = tf.data.Dataset.range(5)
iterator = dataset.make_one_shot_iterator()
print 'build graph and then eval'
keep = []
for i in range(5):
keep.append(iterator.get_next())
with tf.Session() as sess:
keep_eval = sess.run(keep)
print keep_eval
print 'eval each element'
with tf.Session() as sess:
for i in range(5):
print sess.run(iterator.get_next()),
将导致输出如下:
构建图然后评估
[3 0 1 4 2]
[3 0 1 4 2]
评估每个元素
0 1 2 3 4
此外,每次运行都会产生不同的构建图然后评估".我希望构建图形然后评估"也可以像评估每个元素"一样排序.谁能解释为什么会发生这种情况?
Also, each run will yield different "build graph and then eval". I would expect "build graph and then eval" to be ordered as well like "eval each element". Can anyone explain why this happens?
推荐答案
tf.data.Dataset
是 已定义且确定性的顺序(除非您添加非-确定性Dataset.shuffle()
).
The order of a tf.data.Dataset
is defined and deterministic (unless you add a non-deterministic Dataset.shuffle()
).
但是,您的两个循环构建了不同的图,这说明了差异:
However, your two loops build different graphs, which accounts for the difference:
构建图然后评估"部分创建了一个包含五个
iterator.get_next()
操作的列表,并并行运行这五个操作.由于这些操作并行运行,它们可能会以不同的顺序产生结果.
The "build graph and then eval" part creates a list of five
iterator.get_next()
operations and runs the five operations in parallel. Because these operations run in parallel, they may produce results in different order.
评估每个元素"部分还创建了五个 iterator.get_next()
操作,但它会按顺序运行它们,因此您始终可以按预期顺序获得结果.
The "eval each element" part also creates five iterator.get_next()
operations, but it runs them sequentially, so you always get the results in the expected order.
请注意,我们不建议在循环中调用 iterator.get_next()
,因为它会在每次调用时创建一个新操作,并将其添加到图形中并消耗内存.相反,当您遍历 Dataset
时,请尝试使用以下模式:
Note that we do not recommend calling iterator.get_next()
in a loop, because it creates a new operation on each call, which gets added to the graph, and consumes memory. Instead, when you loop over a Dataset
, try to use the following pattern:
dataset = tf.data.Dataset.range(5)
iterator = dataset.make_one_shot_iterator()
# Call `iterator.get_next()` once and use the result in each iteration.
next_element = iterator.get_next()
with tf.Session() as sess:
for i in range(5):
print sess.run(next_element)
这篇关于张量流数据集顺序未定义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!