在GPU上进行培训比在CPU上进行培训要慢得多-为什么以及如何加快速度? [英] Training on GPU much slower than on CPU - why and how to speed it up?
问题描述
我正在使用Google Colab的CPU和GPU训练卷积神经网络。
这是网络的体系结构:
模型:顺序
层(类型)输出形状参数#
=========================== ====================================
conv2d(Conv2D)(无,62,126 ,32)896
_________________________________________________________________
max_pooling2d(MaxPooling2D)(无,31,63,32)0
_________________________________________________________________
conv2d_1(Conv2D)(无,29,61,32)9248
_________________________________________________________________
max_pooling2d_1(MaxPooling2(无,14、30、32)0
_________________________________________________________________
conv2d_2(Conv2D)(无,12、28、64)18496
_________________________________________________________________
max_pooling2d_2(MaxPooling2(None,6,14,64)0
_ ________________________________________________________________
conv2d_3(Conv2D)(无,4、12、64)36928
_________________________________________________________________
max_pooling2d_3(MaxPooling2(无,2、6、64)0
_________________________________________________________________
flatten(Flatten)(无,768)0
辍学(Dropout)(无,768)0
_________________________________________________________________
lambda(Lambda)(无,1,768)0
_________________________________________________________________
密集(密集)(无,1,256)196864
_________________________________________________________________
密集_1(密集)(无,1,8)2056
_______ __________________________________________________________
permute(Permute)(无,8、1)0
_________________________________________________________________
density_2(Dense)(无,8、36)72
======== ================================================== =======
总参数:264,560
可训练参数:264,560
非可训练参数:0
所以,这是一个很小的网络,但是输出特定,形状为(8,36)
,因为我想识别图像上的字符
我使用以下代码来训练网络:
model.fit_generator(generator = training_generator ,
validation_data = validation_generator,
steps_per_epoch = num_train_samples // 128,
validation_steps = num_val_samples // 128,
epoch = 10)
T他生成器将图像调整为(64,128)
。这是关于生成器的代码:
class DataGenerator(Sequence):
def __init __(self,x_set,y_set ,batch_size):
self.x,self.y = xset,yset
self.batch_size = batch_size
def __len __(self):
返回math.ceil (len(self.x)/ self.batch_size)
def __getitem __(self,idx):
batch_x = self.x [idx * self.batch_size:(idx + 1)*
self.batch_size]
batch_y = self.y [idx * self.batch_size:(idx + 1)*
self.batch_size]
返回np.array ([
resize(imread(file_name),(64,128))
对于batch_x中的文件名]),np.array(batch_y)
在CPU上一个历时需要70-90分钟。在GPU(149瓦)上,它花费的时间是在CPU上的5倍。
- 您知道吗,为什么要花这么长时间?生成器有问题吗?
- 我可以以某种方式加快此过程吗?
编辑:这位我笔记本的链接:
转到编辑
-> 笔记本设置
,然后选择 GPU
。然后单击保存
I am training a Convolutional Neural Network using Google Colab's CPU and GPU.
This is the architecture of the network:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 62, 126, 32) 896
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 31, 63, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 29, 61, 32) 9248
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 30, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 12, 28, 64) 18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 6, 14, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 4, 12, 64) 36928
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 2, 6, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 768) 0
_________________________________________________________________
dropout (Dropout) (None, 768) 0
_________________________________________________________________
lambda (Lambda) (None, 1, 768) 0
_________________________________________________________________
dense (Dense) (None, 1, 256) 196864
_________________________________________________________________
dense_1 (Dense) (None, 1, 8) 2056
_________________________________________________________________
permute (Permute) (None, 8, 1) 0
_________________________________________________________________
dense_2 (Dense) (None, 8, 36) 72
=================================================================
Total params: 264,560
Trainable params: 264,560
Non-trainable params: 0
So, this is a very small network but a specific output, shape (8, 36)
because I want to recognize characters on an image of license plates.
I used this code to train the network:
model.fit_generator(generator=training_generator,
validation_data=validation_generator,
steps_per_epoch = num_train_samples // 128,
validation_steps = num_val_samples // 128,
epochs = 10)
The generator resizes the images to (64, 128)
. This is the code regarding the generator:
class DataGenerator(Sequence):
def __init__(self, x_set, y_set, batch_size):
self.x, self.y = x_set, y_set
self.batch_size = batch_size
def __len__(self):
return math.ceil(len(self.x) / self.batch_size)
def __getitem__(self, idx):
batch_x = self.x[idx * self.batch_size:(idx + 1) *
self.batch_size]
batch_y = self.y[idx * self.batch_size:(idx + 1) *
self.batch_size]
return np.array([
resize(imread(file_name), (64, 128))
for file_name in batch_x]), np.array(batch_y)
On CPU one epoch takes 70-90 minutes. On GPU (149 Watt) it takes 5 times as long as on CPU.
- Do you know, why it takes so long? Is there something wrong with the generator?
- Can I speed this process up somehow?
Edit: This ist the link to my notebook: https://colab.research.google.com/drive/1ux9E8DhxPxtgaV60WUiYI2ew2s74Xrwh?usp=sharing
My data is stored in my Google Drive. The training data set contains 105 k images and the validation data set 76 k. All in all, I have 1.8 GB of data.
Should I maybe store the data at another place?
Thanks a lot!
I think, you did not enable a GPU
Go to Edit
-> Notebook Settings
and choose GPU
. Then click SAVE
这篇关于在GPU上进行培训比在CPU上进行培训要慢得多-为什么以及如何加快速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!