如何使用Tensorflow-GPU和Keras修复低挥发性GPU-Util? [英] How to fix low volatile GPU-Util with Tensorflow-GPU and Keras?

查看:534
本文介绍了如何使用Tensorflow-GPU和Keras修复低挥发性GPU-Util?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一台4 GPU机器,在其中运行Keras的Tensorflow(GPU).我的一些分类问题需要几个小时才能完成.

I have a 4 GPU machine on which I run Tensorflow (GPU) with Keras. Some of my classification problems take several hours to complete.

nvidia-smi返回了Volatile GPU-Util,在我的4个GPU中,任何一个都从未超过25%. 如何增加GPU使用率并加快培训速度?

nvidia-smi returns Volatile GPU-Util which never exceeds 25% on any of my 4 GPUs. How can I increase GPU Util% and speed up my training?

推荐答案

如果您的GPU使用率低于80%,通常表明存在输入管道瓶颈.这意味着GPU大部分时间处于空闲状态,等待CPU准备数据:

If your GPU util is below 80%, this is generally the sign of an input pipeline bottleneck. What this means is that the GPU sits idle much of the time, waiting for the CPU to prepare the data:

您想要的是CPU在训练GPU以保持供电的同时继续准备批处理.这称为预取:

What you want is the CPU to keep preparing batches while the GPU is training to keep the GPU fed. This is called prefetching:

很好,但是如果批次准备仍比模型训练更长,GPU仍将保持空闲状态,等待CPU完成下一个批次.为了使批次准备更快,我们可以并行化不同的预处理操作:

Great, but if the batch preparation is still way longer than the model training, the GPU will still remain idle, waiting for the CPU to finish the next batch. To make the batch preparation faster we can parallelize the different preprocessing operations:

通过并行化I/O,我们可以走得更远:

We can go even further by parallelizing I/O:

现在要在Keras中实现此功能,您需要使用Tensorflow版本> = 1.9.0的Tensorflow Data API.这是一个示例:

Now to implement this in Keras, you need to use the Tensorflow Data API with Tensorflow version >= 1.9.0. Here is an example:

为了这个例子,让我们假设您有两个numpy数组x和y.您可以将tf.data用于任何类型的数据,但这更易于理解.

Let's assume, for the sake of this example that you have two numpy arrays x and y. You can use tf.data for any type of data but this is simpler to understand.

def preprocessing(x, y):
     # Can only contain TF operations
     ...
     return x, y

dataset = tf.data.Dataset.from_tensor_slices((x, y)) # Creates a dataset object 
dataset = dataset.map(preprocessing, num_parallel_calls=64) # parallel preprocessing
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(None) # Will automatically prefetch batches

....

model = tf.keras.model(...)
model.fit(x=dataset) # Since tf 1.9.0 you can pass a dataset object

tf.data非常灵活,但是与Tensorflow中的任何内容(渴望的除外)一样,它使用静态图.有时候可能会很痛苦,但是加快速度是值得的.

tf.data is very flexible, but as anything in Tensorflow (except eager), it uses a static graph. This can be a pain sometimes but the speed up is worth it.

要走得更远,您可以查看性能指南 Tensorflow数据指南.

To go further, you can have a look at the performance guide and the Tensorflow data guide.

这篇关于如何使用Tensorflow-GPU和Keras修复低挥发性GPU-Util?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆