Tensorflow在后台运行队列会导致奇怪的异常 [英] Running queue in background in Tensorflow causes strange exceptions
问题描述
我正在Tensorflow中实现这样的图:有一个队列Q,后台线程将张量排入队列Q.在主线程中,我从Q中顺序出队.
I am implementing such graph in Tensorflow: there is a queue Q, to which a background thread is enqueueing tensors. In the main thread, I sequentially dequeue elements from Q.
我的代码可以简化如下:
My code can be simplified as following:
import time
import threading
import tensorflow as tf
sess = tf.InteractiveSession()
coord = tf.train.Coordinator()
q = tf.FIFOQueue(32, dtypes=tf.int32)
def loop(g):
with g.as_default():
enqueue_op = q.enqueue(1, name="example_enqueue")
for i in range(20):
if coord.should_stop():
return
try:
sess.run(enqueue_op)
except tf.errors.CancelledError:
print("enqueue canncelled")
threads = [
threading.Thread(target=loop, args=(tf.get_default_graph(),))
]
sess.run(tf.initialize_all_variables())
for t in threads: t.start()
# If I sleep 1 seconds, it will be fine!
# time.sleep(1)
print(sess.run(q.dequeue()))
coord.request_stop()
coord.join(threads)
sess.close()
我评论说,如果我在运行出队操作之前睡1秒钟,一切都会好起来的.但是,如果立即运行,将引发以下异常:
I commented, if I sleep 1 second before running dequeue operation, things will be fine. However, if run immediately, following exception will be raised:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
return fn(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
status, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
next(self.gen)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
在处理上述异常期间,发生了另一个异常:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
"__main__", mod_spec)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 32, in <module>
print(sess.run(q.dequeue()))
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
run_metadata_ptr)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string, options, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
target_list, options, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
HanXus-MacBook-Pro:BrainSeg hanxu$ python3 -m playgrounds.7
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
return fn(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
status, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
next(self.gen)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
在处理上述异常期间,发生了另一个异常:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
"__main__", mod_spec)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 34, in <module>
print(sess.run(q.dequeue()))
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
run_metadata_ptr)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string, options, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
target_list, options, run_metadata)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
有人可以帮忙吗?非常感谢!
Could anyone help? Thanks very much!!
我正在使用Tensorflow 9.0rc0.
I am using Tensorflow 9.0rc0.
我的实际情况要复杂一些.实际上,排队的张量每次都是不同的
My real situation is a little more complicated. The enqueued tensor is in fact different at each time, say
def loop(g):
with g.as_default():
for i in range(20):
if coord.should_stop():
return
# Look here!
enqueue_op = q.enqueue(i, name="example_enqueue")
try:
sess.run(enqueue_op)
except tf.errors.CancelledError:
print("enqueue canncelled")
因此将入队操作移至主线程并非易事:(而且我不知道如何.请帮助:)
So it is not trivial to move the enqueue operation to main thread:( and I don't know how. Please help:)
推荐答案
这是一个问题带有TensorFlow的旧版本(0.9之前的版本),该版本在0.9版本中已已修复 .问题在于,当其他线程(即您的loop()
线程)正在使用该图时,将节点添加到图(即在您对q.dequeue()
和q.enqueue()
的调用中)不是线程安全的.
This was an issue with old (pre-0.9) versions of TensorFlow, which was fixed in version 0.9. The issue is that adding nodes to the graph (i.e. in your calls to q.dequeue()
and q.enqueue()
) was not thread-safe when other threads (i.e. your loop()
thread) were using the graph.
为了避免出现竞争状况(在0.9之前的版本中),您需要解决两个问题:
There are two issues you'd need to fix to avoid the race condition (in pre-0.9 versions):
-
不要在
loop()
线程中调用q.enqueue()
.而是在主线程中创建它.例如:
Don't call
q.enqueue()
in theloop()
thread. Instead create it in the main thread. For example:
q = tf.FIFOQueue(32, dtypes=tf.int32)
enqueue_op = q.enqueue(1, name="example_enqueue")
def loop(g):
for i in range(20):
if coord.should_stop():
return
try:
sess.run(enqueue_op)
except tf.errors.CancelledError:
print("enqueue canncelled")
在启动loop()
线程之前,将调用移至q.dequeue()
(这会向图形添加一个节点):
Move the call to q.dequeue()
(which adds a node to the graph) before where you start the loop()
thread:
dequeued_t = q.dequeue()
for t in threads: t.start()
print(sess.run(deqeueued_t))
这篇关于Tensorflow在后台运行队列会导致奇怪的异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!