仅使用1-5%Tensorflow-gpu和Keras的GPU [英] GPU only being used 1-5% Tensorflow-gpu and Keras

查看:109
本文介绍了仅使用1-5%Tensorflow-gpu和Keras的GPU的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚为gpu安装了tensorflow,并为我的CNN使用了keras.在训练期间,我的GPU仅使用了5%,但是在训练期间使用了6 GB的vram中的5.有时会出现故障,在控制台上打印0.000000e + 00,gpu降到100%,但几秒钟后,训练速度会降低到5%.我的GPU是Zotac gtx 1060 mini,我使用的是Ryzen 5 1600x.

I just installed tensorflow for gpu and am using keras for my CNN. During training my GPU is only used about 5%, but 5 out of 6gb of the vram is being used during the training. Sometimes it glitches, prints 0.000000e+00 in the console and the gpu goes to 100% but then after a few seconds the training slows back down to 5%. My GPU is the Zotac gtx 1060 mini and I am using a Ryzen 5 1600x.

Epoch 1/25
 121/3860 [..............................] - ETA: 31:42 - loss: 3.0575 - acc: 0.0877 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00Epoch 2/25
 121/3860 [..............................] - ETA: 29:48 - loss: 3.0005 - acc: 0.0994 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00Epoch 3/25
  36/3860 [..............................] - ETA: 24:47 - loss: 2.9863 - acc: 0.1024

推荐答案

通常,我们希望瓶颈在GPU上(因此利用率为100%).如果这没有发生,则在每次批处理期间,代码的其他部分将花费很长时间.很难说这是什么(特别是因为您没有添加任何代码),但是您可以尝试以下操作:

Usually, we want the bottleneck to be on the GPU (hence 100% utilization). If that's not happening, some other part of your code is taking a long time during each batch processing. It's hard to say what is it (specialy because you didn't add any code), but there's a few things you can try:

1.输入数据

确保网络的输入数据始终可用.从磁盘读取图像需要很长时间,因此请使用多个workersmultiprocessing界面:

Make sure the input data for your network is always available. Reading images from disk takes a long time, so use multiple workers and the multiprocessing interface:

model.fit(..., use_multiprocessing=True, workers=8)

2.将模型强制进入GPU

这几乎不是问题,因为/gpu:0是默认设备,但是有必要确保在目标设备中执行模型:

This is hardly the problem, because /gpu:0 is the default device, but it's worth to make sure you are executing the model in the intended device:

with tf.device('/gpu:0'):
    x = Input(...)
    y = Conv2D(..)
    model = Model(x, y)

2.检查模型的大小

如果批处理量很大并且允许软放置,则网络的某些部分(不适合GPU的内存)可能会放置在CPU上.这会大大减慢该过程.

If your batch size is large and allowed soft placement, parts of your network (which didn't fit in the GPU's memory) might be placed at the CPU. This considerably slows down the process.

如果启用了软放置,请尝试禁用并检查是否引发了内存错误:

If soft placement is on, try to disable and check if a memory error is thrown:

# make sure soft-placement is off
tf_config = tf.ConfigProto(allow_soft_placement=False)
tf_config.gpu_options.allow_growth = True
s = tf.Session(config=tf_config)
K.set_session(s)

with tf.device(...):
    ...

model.fit(...)

在这种情况下,请尝试减小批大小,直到合适为止,并为您提供良好的GPU使用率.然后再次打开软放置.

If that's the case, try to reduce the batch size until it fits and give you good GPU usage. Then turn soft placement on again.

这篇关于仅使用1-5%Tensorflow-gpu和Keras的GPU的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆