在GPU上进行培训比在CPU上进行培训要慢得多-为什么以及如何加快速度？ [英] Training on GPU much slower than on CPU - why and how to speed it up?

查看：203 发布时间：2020/10/9 2:57:33 python gpu conv-neural-network google-colaboratory

本文介绍了在GPU上进行培训比在CPU上进行培训要慢得多-为什么以及如何加快速度？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Google Colab的CPU和GPU训练卷积神经网络。

这是网络的体系结构：

 模型：顺序 
层（类型）输出形状参数＃
 =========================== ==================================== 
 conv2d（Conv2D）（无，62，126 ，32）896 
 _________________________________________________________________ 
 max_pooling2d（MaxPooling2D）（无，31，63，32）0 
 _________________________________________________________________ 
 conv2d_1（Conv2D）（无，29，61，32）9248 
 _________________________________________________________________ 
 max_pooling2d_1（MaxPooling2（无，14、30、32）0 
 _________________________________________________________________ 
 conv2d_2（Conv2D）（无，12、28、64）18496 
 _________________________________________________________________ 
 max_pooling2d_2（MaxPooling2（None，6，14，64）0 
 _ ________________________________________________________________ 
 conv2d_3（Conv2D）（无，4、12、64）36928 
 _________________________________________________________________ 
 max_pooling2d_3（MaxPooling2（无，2、6、64）0 
 _________________________________________________________________ 
 flatten（Flatten）（无，768）0 
辍学（Dropout）（无，768）0 
 _________________________________________________________________ 
 lambda（Lambda）（无，1，768）0 
 _________________________________________________________________ 
密集（密集）（无，1，256）196864 
 _________________________________________________________________ 
密集_1（密集）（无，1，8）2056 
 _______ __________________________________________________________ 
 permute（Permute）（无，8、1）0 
 _________________________________________________________________ 
 density_2（Dense）（无，8、36）72 
 ======== ================================================== ======= 
总参数：264,560 
可训练参数：264,560 
非可训练参数：0

所以，这是一个很小的网络，但是输出特定，形状为（8，36），因为我想识别图像上的字符

我使用以下代码来训练网络：

  model.fit_generator（generator = training_generator ，
validation_data = validation_generator，
 steps_per_epoch = num_train_samples // 128，
validation_steps = num_val_samples // 128，
 epoch = 10）

T他生成器将图像调整为（64，128）。这是关于生成器的代码：

  class DataGenerator（Sequence）：
 
 def __init __（self，x_set，y_set ，batch_size）：
 self.x，self.y = xset，yset 
 self.batch_size = batch_size 
 
 def __len __（self）：
返回math.ceil （len（self.x）/ self.batch_size）
 
 def __getitem __（self，idx）：
 batch_x = self.x [idx * self.batch_size：（idx + 1）* 
 self.batch_size] 
 batch_y = self.y [idx * self.batch_size：（idx + 1）* 
 self.batch_size] 
 
返回np.array （[
 resize（imread（file_name），（64，128））
对于batch_x中的文件名]），np.array（batch_y）

在CPU上一个历时需要70-90分钟。在GPU（149瓦）上，它花费的时间是在CPU上的5倍。

您知道吗，为什么要花这么长时间？生成器有问题吗？

我可以以某种方式加快此过程吗？

编辑：这位我笔记本的链接：

转到编辑-> 笔记本设置，然后选择 GPU 。然后单击保存

I am training a Convolutional Neural Network using Google Colab's CPU and GPU.

This is the architecture of the network:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 62, 126, 32)       896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 31, 63, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 29, 61, 32)        9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 30, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 12, 28, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 6, 14, 64)         0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 4, 12, 64)         36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 2, 6, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 768)               0         
_________________________________________________________________
dropout (Dropout)            (None, 768)               0         
_________________________________________________________________
lambda (Lambda)              (None, 1, 768)            0         
_________________________________________________________________
dense (Dense)                (None, 1, 256)            196864    
_________________________________________________________________
dense_1 (Dense)              (None, 1, 8)              2056      
_________________________________________________________________
permute (Permute)            (None, 8, 1)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 8, 36)             72        
=================================================================
Total params: 264,560
Trainable params: 264,560
Non-trainable params: 0

So, this is a very small network but a specific output, shape (8, 36) because I want to recognize characters on an image of license plates.

I used this code to train the network:

model.fit_generator(generator=training_generator,
                    validation_data=validation_generator,
                    steps_per_epoch = num_train_samples // 128,
                    validation_steps = num_val_samples // 128,
                    epochs = 10)

The generator resizes the images to (64, 128). This is the code regarding the generator:

class DataGenerator(Sequence):

    def __init__(self, x_set, y_set, batch_size):
        self.x, self.y = x_set, y_set
        self.batch_size = batch_size

    def __len__(self):
        return math.ceil(len(self.x) / self.batch_size)

    def __getitem__(self, idx):
        batch_x = self.x[idx * self.batch_size:(idx + 1) *
        self.batch_size]
        batch_y = self.y[idx * self.batch_size:(idx + 1) *
        self.batch_size]

        return np.array([
            resize(imread(file_name), (64, 128))
               for file_name in batch_x]), np.array(batch_y)

On CPU one epoch takes 70-90 minutes. On GPU (149 Watt) it takes 5 times as long as on CPU.

Do you know, why it takes so long? Is there something wrong with the generator?
Can I speed this process up somehow?

Edit: This ist the link to my notebook: https://colab.research.google.com/drive/1ux9E8DhxPxtgaV60WUiYI2ew2s74Xrwh?usp=sharing

My data is stored in my Google Drive. The training data set contains 105 k images and the validation data set 76 k. All in all, I have 1.8 GB of data.

Should I maybe store the data at another place?

Thanks a lot!

解决方案

I think, you did not enable a GPU

Go to Edit -> Notebook Settings and choose GPU. Then click SAVE

这篇关于在GPU上进行培训比在CPU上进行培训要慢得多-为什么以及如何加快速度？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在GPU上进行培训比在CPU上进行培训要慢得多-为什么以及如何加快速度？ [英] Training on GPU much slower than on CPU - why and how to speed it up?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在GPU上进行培训比在CPU上进行培训要慢得多-为什么以及如何加快速度？ [英] Training on GPU much slower than on CPU - why and how to speed it up?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭