使用Conv2DTranspose输出其输入形状的两倍 [英] Using Conv2DTranspose to output the double of its input shape

查看:1127
本文介绍了使用Conv2DTranspose输出其输入形状的两倍的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是使用Python 3.7.7和Tensorflow 2.1.0的新手,并且试图了解 Conv2DTranspose 。我已经尝试过此代码:

I'm newbie with Python 3.7.7 and Tensorflow 2.1.0 and I'm trying to understand Conv2DTranspose. I have tried this code:

def vgg16_decoder(input_size = (7, 7, 512)):
    inputs = Input(input_size, name = 'input')

    conv1 = Conv2DTranspose(512, (2, 2), dilation_rate = 2, name = 'conv1')(inputs)

    model = Model(inputs = inputs, outputs = conv1, name = 'vgg-16_decoder')

    opt = Adam(lr=0.001)
    model.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])

    return model

这是其摘要:


Model: "vgg-16_decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input (InputLayer)           (None, 7, 7, 512)         0
_________________________________________________________________
conv1 (Conv2DTranspose)      (None, 9, 9, 512)         1049088
=================================================================
Total params: 1,049,088
Trainable params: 1,049,088
Non-trainable params: 0
_________________________________________________________________


但是我想要从 conv1 输出(无,14、14、512)

我将过滤器大小更改为(3,3),我得到了以下摘要:

I have changed filter size to (3, 3) and I get this summary:


Model: "vgg-16_decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input (InputLayer)           (None, 7, 7, 512)         0
_________________________________________________________________
conv1 (Conv2DTranspose)      (None, 11, 11, 512)       2359808
=================================================================
Total params: 2,359,808
Trainable params: 2,359,808
Non-trainable params: 0
_________________________________________________________________


我正尝试使用 Conv2DTranspose 做到这一点:

I'm trying to get to this using Conv2DTranspose:

# A piece of code from U-NET implementation

up6 = Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal', name = 'up6')(UpSampling2D(size = (2,2), name = 'upsp1')(drop5))

及其摘要:


drop5 (Dropout)                 (None, 16, 16, 1024) 0           conv5_2[0][0]
__________________________________________________________________________________________________
upsp1 (UpSampling2D)            (None, 32, 32, 1024) 0           drop5[0][0]
__________________________________________________________________________________________________
up6 (Conv2D)                    (None, 32, 32, 512)  2097664     upsp1[0][0]
__________________________________________________________________________________________________


它对输入进行2倍采样,并更改了过滤器数量。

It upsamples by 2 its input and it changes its number of filters.

我该如何

UPDATE

用Conv2DTranspose可以做到吗? ,或者我想我做了,但是我不明白自己做了什么:

I think, or I suppose, I did it, but I don't understand what I did:

conv1 = Conv2DTranspose(512, (2, 2), strides = 2, name = 'conv1')(inputs)

语句,我得到此摘要:


Model: "vgg-16_decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input (InputLayer)           (None, 7, 7, 512)         0
_________________________________________________________________
conv1 (Conv2DTranspose)      (None, 14, 14, 512)       1049088
=================================================================
Total params: 1,049,088
Trainable params: 1,049,088
Non-trainable params: 0
_________________________________________________________________


如果您想纠正我或解释我在这里所做的事情,欢迎您。

If you want to correct me or explain what I have done here, you are welcome.

更新2

顺便说一句,我正在尝试创建VGG-16解码器。这是我的VGG-16编码器的代码:

By the way, I'm trying to create an VGG-16 decoder. This is the code for my VGG-16 encoder:

def vgg16_encoder(input_size = (224,224,3)):
    inputs = Input(input_size, name = 'input')

    conv1 = Conv2D(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_1')(inputs)
    conv1 = Conv2D(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_2')(conv1)
    pool1 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_1')(conv1)

    conv2 = Conv2D(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_1')(pool1)
    conv2 = Conv2D(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_2')(conv2)
    pool2 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_2')(conv2)

    conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_1')(pool2)
    conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_2')(conv3)
    conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_3')(conv3)
    pool3 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_3')(conv3)

    conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_1')(pool3)
    conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_2')(conv4)
    conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_3')(conv4)
    pool4 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_4')(conv4)

    conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_1')(pool4)
    conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_2')(conv5)
    conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_3')(conv5)
    pool5 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_5')(conv5)

    opt = Adam(lr=0.001)

    model = Model(inputs = inputs, outputs = pool5, name = 'vgg-16_encoder')

    model.compile(optimizer=opt, loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])

    return model


推荐答案

在设计编码器-解码器体系结构时,我们需要一些操作来逆转已经完成的操作。因此,假设在编码器中有Conv2D和Pooling(在VGG等架构中很常见)。我们使用Conv2dTranspose(可以认为是Conv2D的反向操作)和Upsampling2D(是Pooling的反向操作(嗯,不是严格的[池是不可逆的操作,因为信息丢失]))。

When we design encoder-decoder architecture we need some operation that reverses the operations already done. So, let's say in encoder we have Conv2D, and Pooling (common in architectures like VGG). We use Conv2dTranspose (this can be thought of reverse operation of Conv2D), and Upsampling2D (reverse operation of Pooling (well, not rigorously [pooling is an irreversible operation as information is lost])).

注意:您不想使用Conv2DTranspose对特征图进行升采样(您可以,但是对于VGG,我不认为Conv2DTranspose会以您希望的方式在解码器中提供升采样的特征图),它不是以这种方式设计的(它还学习上采样,但它会学习最佳的上采样参数,该参数略有不同)。您最终将拥有非常大的内核,这将导致与您所谈论的VGG编码器完全不同的网络。

from tensorflow.keras.layers import *
from tensorflow.keras.models import *

def encoder_decoder_conv(input_size = (224,224,3)):
    ip = Input((224,224,3))
    # encoder
    conv = Conv2D(512, (3,3))(ip) # look here, the default padding is used
    # decoder
    inv_conv = Conv2DTranspose(3, (3,3))(conv)
    # simple model
    model = Model(ip, inv_conv)
    return model

model1 = encoder_decoder_conv()
model1.summary()

def encoder_decoder_pooling(input_size = (224,224,3)):
    ip = Input((224,224,3))
    # encoder
    pool = MaxPool2D((2,2))(ip) # look here, the default padding is used
    # decoder
    inv_pool = UpSampling2D((2,2))(pool)
    # simple model
    model = Model(ip, inv_pool)
    return model

model2 = encoder_decoder_pooling()
model2.summary()



Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 222, 222, 512)     14336     
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 224, 224, 3)       13827     
=================================================================
Total params: 28,163
Trainable params: 28,163
Non-trainable params: 0
_________________________________________________________________
Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 112, 112, 3)       0         
_________________________________________________________________
up_sampling2d (UpSampling2D) (None, 224, 224, 3)       0         
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0

如您所见,在第一个模型中,使用Conv2DTranspose反转操作以获得与输入完全相同的形状(224,224,3 )。

As, you can see in the first model, with Conv2DTranspose we reverse the operations to get exactly the same shape as input (224,224,3).

对于模型2,我们使用上采样反转池化操作(就要素图形状而言)。

For model2, we reverse the operation of Pooling (in terms of feature map shape) with Upsampling.

因此,当您尝试制作VGG解码器且VGG主要由Conv2D和Maxpooling2D组成时,您只需使用Conv2dTranspose和Upsampling来反转这些操作,以便从中获得确切的输入形状(224、224、3)功能图形状(7,7,512)。

So, as you're trying to make a VGG-decoder, and VGG mostly consists of Conv2D and Maxpooling2D, all you have to do reverse those operations using Conv2dTranspose and Upsampling so you get the exact input shape (224, 224, 3) from the feature map shape (7, 7, 512).

最后,解码器部分有一些变化,但是我认为您正在寻找的是VGG-16

Finally, there are some variations of the decoder part, but I think you're looking for this VGG-16 decoder.

def vgg16_decoder(input_size = (7,7,512)):
    inputs = Input(input_size, name = 'input')

    pool5 = UpSampling2D((2,2), name = 'pool_5')(inputs)
    conv5 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_3')(pool5)

    conv5 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_2')(conv5)

    conv5 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_1')(conv5)

    pool4 = UpSampling2D((2,2), name = 'pool_4')(conv5)

    conv4 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_3')(pool4)

    conv4 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_2')(conv4)
    conv4 = Conv2DTranspose(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_1')(conv4)
    pool3 = UpSampling2D((2,2), name = 'pool_3')(conv4)

    conv3 = Conv2DTranspose(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_3')(pool3)
    conv3 = Conv2DTranspose(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_2')(conv3)

    conv3 = Conv2DTranspose(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_1')(conv3)

    pool2 = UpSampling2D((2,2), name = 'pool_2')(conv3)
    conv2 = Conv2DTranspose(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_2')(pool2)

    conv2 = Conv2DTranspose(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_1')(conv2)

    pool1 = UpSampling2D((2,2), name = 'pool_1')(conv2)

    conv1 = Conv2DTranspose(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_2')(pool1)

    conv1 = Conv2DTranspose(3, (3, 3), activation = 'relu', padding = 'same', name ='conv1_1')(conv1) # to get 3 channels

    model = Model(inputs = inputs, outputs = conv1, name = 'vgg-16_encoder')

    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

    return model

model = vgg16_decoder()
model.summary()



Model: "vgg-16_encoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           [(None, 7, 7, 512)]       0         
_________________________________________________________________
pool_5 (UpSampling2D)        (None, 14, 14, 512)       0         
_________________________________________________________________
conv5_3 (Conv2DTranspose)    (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv5_2 (Conv2DTranspose)    (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv5_1 (Conv2DTranspose)    (None, 14, 14, 512)       2359808   
_________________________________________________________________
pool_4 (UpSampling2D)        (None, 28, 28, 512)       0         
_________________________________________________________________
conv4_3 (Conv2DTranspose)    (None, 28, 28, 512)       2359808   
_________________________________________________________________
conv4_2 (Conv2DTranspose)    (None, 28, 28, 512)       2359808   
_________________________________________________________________
conv4_1 (Conv2DTranspose)    (None, 28, 28, 512)       2359808   
_________________________________________________________________
pool_3 (UpSampling2D)        (None, 56, 56, 512)       0         
_________________________________________________________________
conv3_3 (Conv2DTranspose)    (None, 56, 56, 256)       1179904   
_________________________________________________________________
conv3_2 (Conv2DTranspose)    (None, 56, 56, 256)       590080    
_________________________________________________________________
conv3_1 (Conv2DTranspose)    (None, 56, 56, 256)       590080    
_________________________________________________________________
pool_2 (UpSampling2D)        (None, 112, 112, 256)     0         
_________________________________________________________________
conv2_2 (Conv2DTranspose)    (None, 112, 112, 128)     295040    
_________________________________________________________________
conv2_1 (Conv2DTranspose)    (None, 112, 112, 128)     147584    
_________________________________________________________________
pool_1 (UpSampling2D)        (None, 224, 224, 128)     0         
_________________________________________________________________
conv1_2 (Conv2DTranspose)    (None, 224, 224, 64)      73792     
_________________________________________________________________
conv1_1 (Conv2DTranspose)    (None, 224, 224, 3)       1731      
=================================================================
Total params: 17,037,059
Trainable params: 17,037,059
Non-trainable params: 0

花费(7,7,512) 特征形状并重建原始图像尺寸(224,224,3)

It takes (7, 7, 512) feature shape and reconstructs the original image dimension (224, 224, 3).

总而言之,设计解码器的机械方法是在执行相反操作时朝相反的方向(相对于编码器)进行。至于Conv2DTranspose和Upsampling2D的详细信息,如果您想更深入地了解这些概念:

In summary, the mechanical way of designing a decoder would be going in the opposite direction (relative to the encoder) while doing reverse operations. As for details of Conv2DTranspose and Upsampling2D, if you want to really understand these concepts in more depth:

https://cs231n.github.io/convolutional-networks/

https://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers

https://www.matthewzeiler.com /mattzeiler/deconvolutionalnetworks.pdf

这篇关于使用Conv2DTranspose输出其输入形状的两倍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆