训练某些网络时,GPU上的Keras(Tensorflow后端)比CPU上慢 [英] Keras (Tensorflow backend) slower on GPU than on CPU when training certain networks

查看:97
本文介绍了训练某些网络时,GPU上的Keras(Tensorflow后端)比CPU上慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难确切地理解为什么小尺寸网络的GPU和CPU速度类似(CPU有时更快)而大尺寸网络的GPU更快.问题底部的代码在i7-6700k上的运行时间为103.7s,但是当使用tensorflow-gpu时,代码的运行时间为29.5秒.

I am having some difficulty understanding exactly why the GPU and CPU speeds are similar with networks of small size (CPU is sometimes faster), and GPU is faster with networks of larger size. The code at the bottom of the question runs in 103.7s on an i7-6700k, but when using tensorflow-gpu, the code runs in 29.5 seconds.

但是,当我训练一个具有100个隐藏神经元的网络时,而不是下面的示例中的1000个,当使用GPU时,我得到的时间约为20秒,而使用CPU时则为15秒.

However, when I train a network that has 100 hidden neurons, instead of 1000 like in the example below, I get ~20 seconds when using the GPU, and ~15 seconds when using the CPU.

我在另一个堆栈溢出答案上读到,CPU-> GPU传输需要很长时间,我假设这是在将数据示例加载到GPU上的参考.

I read on another stack overflow answer that CPU->GPU transfers take long, I'm assuming this is in reference to loading the data examples on the GPU.

有人可以解释为什么会发生这种情况,并可能引用我可以做的代码更改以最大程度地提高速度吗?

Can someone explain why this occurs, and possibly reference some change in the code that I can make to maximize speed?

import numpy as np
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.utils import np_utils
from keras.layers.core import Dense, Activation, Flatten, Dropout
from sklearn.preprocessing import normalize

## Importing the MNIST dataset using Keras
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# reshape for vector input
N, x, y = X_train.shape
X_train = normalize(np.reshape(X_train, (N, x * y)))

N, x, y = X_test.shape
X_test = normalize(np.reshape(X_test, (N, x * y)))

# one-hot encoding
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

model = Sequential()
model.add(Dense(output_dim=750, input_dim=784))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(150))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(50))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(50))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(10))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='Nadam', metrics=['accuracy'])

fit = model.fit(X_train, y_train, batch_size=128, nb_epoch=10, verbose=0)

## Printing the accuracy of our model, according to the loss function specified in model.compile above
score = model.evaluate(X_test, y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

推荐答案

在小型网络中,批量加载可能是造成这种情况的罪魁祸首.

In case of tiny networks batch loading may be the culprit here.

Keras会在每次迭代开始时将每个小批量的RAM加载到GPU,从而在小型网络中创建瓶颈(前向/后向计算非常快).
您可以尝试使用 model.fit_generator 而不是普通的 fit ,以便加载微型批处理的CPU线程并行工作.

Keras is loading each minibatch from RAM to GPU at the start of each iteration, thus creating a bottleneck in tiny networks (where forward/backward computation is very quick).
You can try using model.fit_generator instead of plain fit, so that CPU thread which loads minibatches works in parallel.

不幸的是,我没有办法将整个数据集预加载到Keras的GPU上(请参阅

Unfortunately, there is no way I am aware of to preload the whole dataset on GPU for Keras (see my issue)

如果您使用的是Tensorflow后端,则可以使用Google时间轴分析工具来查看导致速度下降的原因.有关参考,请参见此问题

If you're using Tensorflow backend, you can use Google Timeline profiling tool to see what causes the slowdowns. For the reference, see this issue

这篇关于训练某些网络时,GPU上的Keras(Tensorflow后端)比CPU上慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆