Tensorflow 不精确超时 [英] Tensorflow imprecise timeouts

查看:28
本文介绍了Tensorflow 不精确超时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在测试 sess.runs 的超时功能(应用于卷积神经网络),似乎超时不是很精确.

I've been testing out the the timeout functionality for sess.runs (applied to a convolutional neural network), and it seems like the timeouts aren't very precise.

例如,如果我将超时设置为 800 毫秒,则在触发超时异常之前可能会有 1-2 秒的延迟.这种让我相信取消通知不会在计算节点之间捕获.(根据时间线,每个时间为 0.2-.5 秒)

For example, if I set the timeout to be 800 ms, there might be a 1-2 second delay before the timeout exception is triggered. This sort of leads me to believe that cancellation notifications aren't caught between computational nodes. (Which according to the timeline are .2-.5 s each)

所以

1) 有没有办法让超时更精确?

1) Is there a way to make the timeouts more precise?

2) 是否在节点计算之间捕获了 Tensorflow 取消通知?

2) Are Tensorflow cancellation notifications caught between node computations?

推荐答案

TensorFlow 中的取消和超时机制只是为了取消少量的阻塞操作,特别是:出队入队到一个完整的队列,以及从文件中读取.

The cancellation and timeout mechanism in TensorFlow was only designed to cancel a small number of blocking operations, in particular: dequeuing from an empty queue, enqueuing to a full queue, and reading from a file.

如果您运行包含非阻塞操作的图形,例如 tf.matmul()tf.nn.conv2d(),并且超时到期,TensorFlow 通常会等待这些操作完成,然后返回超出期限"错误.

If you run a graph containing non-blocking operations, such as tf.matmul() and tf.nn.conv2d(), and the timeout expires, TensorFlow will typically wait until these operations have completed before returning with a "deadline exceeded" error.

为什么会这样?我们添加了取消,因为用户开始在他们的图中构建阻塞操作的管道(例如对于 读取数据) 并且需要某种形式的取消来干净地关闭这些管道.超时还有助于调试可能在这些管道中不幸发生的死锁.相比之下,TensorFlow 旨在尽可能高效地分派非阻塞操作:例如,当在 GPU 上运行时,TensorFlow 将异步排队 GPU 计算流上的多个操作,而不会在完成时阻塞.虽然在技术上可以检查每个操作的执行之间的取消,但这会增加操作调度的延迟,并在常见情况下降低整体性能.

Why is this the case? We added cancellation because users started to build pipelines of blocking operations into their graphs (e.g. for reading data) and some form of cancellation was needed to shut down these pipelines cleanly. Timeouts also help to debug deadlocks that can unfortunately occur in these pipelines. By contrast, TensorFlow is designed to dispatch non-blocking operations as efficiently as possible: for example, when running on a GPU, TensorFlow will asynchronously enqueue multiple operations on the GPU compute stream without blocking on their completion. Although it would technically be possible to check for cancellation between the execution of each operation, this would add latency to operation dispatch, and reduce overall performance in the common case.

但是,如果非阻塞操作的超时/取消对您的用例有用,请随时打开 GitHub 问题 作为功能请求!

However, if timeouts/cancellation for non-blocking operations would be useful for your use case, please feel free to open a GitHub issue as a feature request!

这篇关于Tensorflow 不精确超时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆