Tensorflow:GPU利用率几乎始终为0％ [英] Tensorflow: GPU Utilization is almost always at 0%

查看：2586 发布时间：2020/5/17 19:18:41 neural-network gpu tensorflow nvidia deep-learning

本文介绍了Tensorflow:GPU利用率几乎始终为0％的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Titan-X GPU上使用了tensorflow，我注意到，当我运行CIFAR10示例时，Volatile GPU-utilization几乎恒定在30％左右，而当我训练自己的模型时，Volatile GPU-utilization远非稳定，几乎总是0％，并一路飙升至80/90％，然后又一次又一次回到0％.

I'm using tensorflow with Titan-X GPUs and I've noticed that, when I run the CIFAR10 example, the Volatile GPU-utilization is pretty constant around 30%, whereas when I train my own model, the Volatile GPU-utilization is far from steady, it is almost always 0% and spikes at 80/90% before going back to 0%, over and over again.

我认为此行为是由于我将数据馈送到网络的方式造成的(我在每个步骤之后都提取了数据，这花费了一些时间).但是，在实现队列以馈送数据并避免步骤之间的等待时间之后，问题仍然存在(请参阅下面的排队系统).

I thought that this behavior was due to the way I was feeding the data to the network (I was fetching the data after each step, which took some time). But after implementing a queue to feed the data and avoid this latency between steps, the problem persisted (see below for the queuing system).

有什么主意吗?

batch = 128 # size of the batch
x = tf.placeholder("float32", [None, n_steps, n_input])
y = tf.placeholder("float32", [None, n_classes])

# with a capacity of 100 batches, the bottleneck should not be the data feeding
queue = tf.RandomShuffleQueue(capacity=100*batch,
                  min_after_dequeue=80*batch,
                  dtypes=[tf.float32, tf.float32],
                  shapes=[[n_steps, n_input], [n_classes]])
enqueue_op = queue.enqueue_many([x, y])
X_batch, Y_batch = queue.dequeue_many(batch)

sess = tf.Session()

def load_and_enqueue(data):
    while True:
        X, Y = data.get_next_batch(batch)
        sess.run(enqueue_op, feed_dict={x: X, y: Y})

train_thread = threading.Thread(target=load_and_enqueue, args=(data))
train_thread.daemon = True
train_thread.start()

for _ in xrange(max_iter):
    sess.run(train_op)

推荐答案

做完一些实验后，我找到了答案，因此将其发布，因为它可能对其他人有用.

After doing some experiments, I found the answer so I post it since it could be useful to someone else.

首先，get_next_batch比train_op慢大约15倍(感谢Eric Platon指出了这一点).

First, get_next_batch is approximately 15x slower than train_op (thanks to Eric Platon for pointing this out).

但是，我认为队列已被喂入capacity，并且只有在训练开始之后才开始.因此，我认为即使get_next_batch的速度要慢得多，该队列也至少应该在开始时就隐藏此延迟，因为该队列包含capacity的示例，并且只有在达到min_after_dequeue之后才需要获取新数据.低于capacity，这将导致GPU利用率保持稳定.

However, I thought that the queue was being fed up to capacity and that only after the training was supposed to begin. Hence, I thought that even if get_next_batch was way slower, the queue should hide this latency, in the beginning at least, since it holds capacity examples and it would need to fetch new data only after it reaches min_after_dequeue which is lower than capacity and that it would result in a somehow steady GPU utilization.

但是实际上，一旦队列达到min_after_dequeue示例，训练就会开始.因此，只要队列到达min_after_dequeue示例以运行train_op，队列就会被出队，并且由于送入队列的时间比train_op的执行时间慢15倍，因此，队列在train_op的第一次迭代之后立即降到min_after_dequeue以下，并且train_op必须等待队列再次到达min_after_dequeue示例.

But actually, the training begins as soon as the queue reaches min_after_dequeue examples. Thus, the queue is being dequeued as soon as the queue reaches min_after_dequeue examples to run the train_op, and since the time to feed the queue is 15x slower than the execution time of train_op, the number of elements in the queue drops below min_after_dequeue right after the first iteration of the train_op and the train_op has to wait for the queue to reach again min_after_dequeue examples.

当我强制train_op等待直到队列被馈入capacity(使用capacity = 100*batch)而不是在达到min_after_dequeue(使用min_after_dequeue=80*batch)时自动启动时，GPU利用率稳定回到0％之前大约需要10秒，这是可以理解的，因为队列在不到10秒的时间内达到了min_after_dequeue示例.

When I force the train_op to wait until the queue is fed up to capacity (with capacity = 100*batch) instead of starting automatically when it reaches min_after_dequeue (with min_after_dequeue=80*batch), the GPU utilization is steady for like 10 seconds before going back to 0%, which is understandable since the queue reaches min_after_dequeue example in less than 10 seconds.

这篇关于Tensorflow:GPU利用率几乎始终为0％的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Tensorflow:GPU利用率几乎始终为0％ [英] Tensorflow: GPU Utilization is almost always at 0%

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Tensorflow:GPU利用率几乎始终为0％ [英] Tensorflow: GPU Utilization is almost always at 0%

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭