Tensorflow在后台运行队列会导致奇怪的异常 [英] Running queue in background in Tensorflow causes strange exceptions

查看:100
本文介绍了Tensorflow在后台运行队列会导致奇怪的异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Tensorflow中实现这样的图:有一个队列Q,后台线程将张量排入队列Q.在主线程中,我从Q中顺序出队.

I am implementing such graph in Tensorflow: there is a queue Q, to which a background thread is enqueueing tensors. In the main thread, I sequentially dequeue elements from Q.

我的代码可以简化如下:

My code can be simplified as following:

import time
import threading
import tensorflow as tf

sess = tf.InteractiveSession()
coord = tf.train.Coordinator()

q = tf.FIFOQueue(32, dtypes=tf.int32)

def loop(g):
    with g.as_default():
        enqueue_op = q.enqueue(1, name="example_enqueue")

        for i in range(20):
            if coord.should_stop():
                return

            try:
                sess.run(enqueue_op)
            except tf.errors.CancelledError:
                print("enqueue canncelled")

threads = [
    threading.Thread(target=loop, args=(tf.get_default_graph(),))
]

sess.run(tf.initialize_all_variables())

for t in threads: t.start()

# If I sleep 1 seconds, it will be fine!
# time.sleep(1)

print(sess.run(q.dequeue()))

coord.request_stop()
coord.join(threads)

sess.close()

我评论说,如果我在运行出队操作之前睡1秒钟,一切都会好起来的.但是,如果立即运行,将引发以下异常:

I commented, if I sleep 1 second before running dequeue operation, things will be fine. However, if run immediately, following exception will be raised:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
    return fn(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
    status, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found

在处理上述异常期间,发生了另一个异常:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 32, in <module>
    print(sess.run(q.dequeue()))
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
    run_metadata_ptr)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
    feed_dict_string, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
    target_list, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found
HanXus-MacBook-Pro:BrainSeg hanxu$ python3 -m playgrounds.7
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 715, in _do_call
    return fn(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 697, in _run_fn
    status, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found

在处理上述异常期间,发生了另一个异常:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/hanxu/Downloads/BrainSeg/playgrounds/7.py", line 34, in <module>
    print(sess.run(q.dequeue()))
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 372, in run
    run_metadata_ptr)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 636, in _run
    feed_dict_string, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
    target_list, options, run_metadata)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.NotFoundError: FetchOutputs node fifo_queue_Dequeue:0: not found

有人可以帮忙吗?非常感谢!

Could anyone help? Thanks very much!!

我正在使用Tensorflow 9.0rc0.

I am using Tensorflow 9.0rc0.

我的实际情况要复杂一些.实际上,排队的张量每次都是不同的

My real situation is a little more complicated. The enqueued tensor is in fact different at each time, say

def loop(g):
    with g.as_default():
        for i in range(20):
            if coord.should_stop():
                return

            # Look here!
            enqueue_op = q.enqueue(i, name="example_enqueue")

            try:
                sess.run(enqueue_op)
            except tf.errors.CancelledError:
                print("enqueue canncelled")

因此将入队操作移至主线程并非易事:(而且我不知道如何.请帮助:)

So it is not trivial to move the enqueue operation to main thread:( and I don't know how. Please help:)

推荐答案

这是一个问题带有TensorFlow的旧版本(0.9之前的版本),该版本在0.9版本中已已修复 .问题在于,当其他线程(即您的loop()线程)正在使用该图时,将节点添加到图(即在您对q.dequeue()q.enqueue()的调用中)不是线程安全的.

This was an issue with old (pre-0.9) versions of TensorFlow, which was fixed in version 0.9. The issue is that adding nodes to the graph (i.e. in your calls to q.dequeue() and q.enqueue()) was not thread-safe when other threads (i.e. your loop() thread) were using the graph.

为了避免出现竞争状况(在0.9之前的版本中),您需要解决两个问题:

There are two issues you'd need to fix to avoid the race condition (in pre-0.9 versions):

  1. 不要在loop()线程中调用q.enqueue().而是在主线程中创建它.例如:

  1. Don't call q.enqueue() in the loop() thread. Instead create it in the main thread. For example:

q = tf.FIFOQueue(32, dtypes=tf.int32)
enqueue_op = q.enqueue(1, name="example_enqueue")

def loop(g):
    for i in range(20):
        if coord.should_stop():
            return
        try:
            sess.run(enqueue_op)
        except tf.errors.CancelledError:
            print("enqueue canncelled")

  • 在启动loop()线程之前,将调用移至q.dequeue()(这会向图形添加一个节点):

  • Move the call to q.dequeue() (which adds a node to the graph) before where you start the loop() thread:

    dequeued_t = q.dequeue()
    
    for t in threads: t.start()
    
    print(sess.run(deqeueued_t))
    

  • 这篇关于Tensorflow在后台运行队列会导致奇怪的异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆