在Keras/TensorFlow中使用可变大小的输入来训练全卷积神经网络花费了不合理的长时间 [英] Training a fully convolutional neural network with inputs of variable size takes unreasonably long time in Keras/TensorFlow

查看:150
本文介绍了在Keras/TensorFlow中使用可变大小的输入来训练全卷积神经网络花费了不合理的长时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为图像分类实现FCNN,以接受可变大小的输入.该模型是在带有TensorFlow后端的Keras中构建的.

I am trying to implement a FCNN for image classification that can accept inputs of variable size. The model is built in Keras with TensorFlow backend.

考虑以下玩具示例:

model = Sequential()

# width and height are None because we want to process images of variable size 
# nb_channels is either 1 (grayscale) or 3 (rgb)
model.add(Convolution2D(32, 3, 3, input_shape=(nb_channels, None, None), border_mode='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(32, 3, 3, border_mode='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(16, 1, 1))
model.add(Activation('relu'))

model.add(Convolution2D(8, 1, 1))
model.add(Activation('relu'))

# reduce the number of dimensions to the number of classes
model.add(Convolution2D(nb_classses, 1, 1))
model.add(Activation('relu'))

# do global pooling to yield one value per class
model.add(GlobalAveragePooling2D())

model.add(Activation('softmax'))

此模型运行良好,但遇到性能问题.与对固定大小的输入进行训练相比,对可变大小的图像进行训练花费了不合理的长时间.如果我将所有图像调整为数据集中的最大大小,则训练模型所需的时间仍然比对可变大小输入进行训练所需的时间少得多.那么input_shape=(nb_channels, None, None)是指定可变大小输入的正确方法吗?有什么方法可以减轻这种性能问题吗?

This model runs fine but I am running into a performance issue. Training on images of variable size takes unreasonably long time compared to training on the inputs of fixed size. If I resize all images to the maximum size in the data set it still takes far less time to train the model than training on the variable size input. So is input_shape=(nb_channels, None, None) the right way to specify variable size input? And is there any way to mitigate this performance problem?

更新

model.summary()对于具有3类和灰度图像的模型:

model.summary() for a model with 3 classes and grayscale images:

Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
convolution2d_1 (Convolution2D)  (None, 32, None, None 320         convolution2d_input_1[0][0]      
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 32, None, None 0           convolution2d_1[0][0]            
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D)    (None, 32, None, None 0           activation_1[0][0]               
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D)  (None, 32, None, None 9248        maxpooling2d_1[0][0]             
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D)    (None, 32, None, None 0           convolution2d_2[0][0]            
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D)  (None, 16, None, None 528         maxpooling2d_2[0][0]             
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 16, None, None 0           convolution2d_3[0][0]            
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D)  (None, 8, None, None) 136         activation_2[0][0]               
____________________________________________________________________________________________________
activation_3 (Activation)        (None, 8, None, None) 0           convolution2d_4[0][0]            
____________________________________________________________________________________________________
convolution2d_5 (Convolution2D)  (None, 3, None, None) 27          activation_3[0][0]               
____________________________________________________________________________________________________
activation_4 (Activation)        (None, 3, None, None) 0           convolution2d_5[0][0]            
____________________________________________________________________________________________________
globalaveragepooling2d_1 (Global (None, 3)             0           activation_4[0][0]               
____________________________________________________________________________________________________
activation_5 (Activation)        (None, 3)             0           globalaveragepooling2d_1[0][0]   
====================================================================================================
Total params: 10,259
Trainable params: 10,259
Non-trainable params: 0

推荐答案

我认为@marcin-możejko在他的评论中可能有正确的答案. 它可能与此错误有关,该错误已得到修复.并且此补丁可能会警告您,如果编译频率太高.

I think @marcin-możejko may have the right answer in his comment. It may be related to this bug, which was just fixed. And this patch may warn you if things are being compiled too often.

因此,升级到tf-nightly-gpu-2.0-preview软件包可能会解决此问题. 您还会在tf.keras上遇到此问题吗?

So upgrading to a tf-nightly-gpu-2.0-preview package may fix this. Also do you get this problem with tf.keras.

如果我将所有图像调整为数据集中的最大大小,则训练模型所需的时间仍然比训练可变大小输入所需的时间少得多

If I resize all images to the maximum size in the data set it still takes far less time to train the model than training on the variable size input

请注意,对于具有相同"填充的基本卷积,除像素对齐方式以外,零填充应对输出具有无"效果.

Note that for basic convolutions with "same" padding, zero padding should have "no" effect on the output, aside from pixel alignment.

所以一种方法是在固定的尺寸列表上训练,并将零填充图像训练到这些尺寸.例如,以128x128、256x256、512x512的批次进行训练.如果您不能解决动态编译问题,则至少只能编译3次.这有点像有时会在序列模型中看到的二维按序列长度存储桶"方法.

So one approach would be to train on a fixed list of sizes and zero pad images to those sizes. For example and train on batches of 128x128, 256x256, 512x512. If you can't fix the dynamic compilation thing this at least would only compile it 3 times. This would be a bit like a 2d "bucket-by-sequence-length" approach sometimes seen with sequence models.

这篇关于在Keras/TensorFlow中使用可变大小的输入来训练全卷积神经网络花费了不合理的长时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆