了解keras Conv2DTranspose的输出形状 [英] understanding output shape of keras Conv2DTranspose

查看:1725
本文介绍了了解keras Conv2DTranspose的输出形状的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难理解keras.layers.Conv2DTranspose的输出形状

I am having a hard time understanding the output shape of keras.layers.Conv2DTranspose

这是原型:

keras.layers.Conv2DTranspose(
    filters,
    kernel_size,
    strides=(1, 1),
    padding='valid',
    output_padding=None,
    data_format=None,
    dilation_rate=(1, 1),
    activation=None,
    use_bias=True,
    kernel_initializer='glorot_uniform',
    bias_initializer='zeros',
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None
)

在文档中( https://keras.io/layers/convolutional/),我读到:

In the documentation (https://keras.io/layers/convolutional/), I read:

If output_padding is set to None (default), the output shape is inferred.

在代码中( https://github .com/keras-team/keras/blob/master/keras/layers/convolutional.py ),我读到:

In the code (https://github.com/keras-team/keras/blob/master/keras/layers/convolutional.py), I read:

out_height = conv_utils.deconv_length(height,
                                      stride_h, kernel_h,
                                      self.padding,
                                      out_pad_h,
                                      self.dilation_rate[0])
out_width = conv_utils.deconv_length(width,
                                     stride_w, kernel_w,
                                     self.padding,
                                     out_pad_w,
                                     self.dilation_rate[1])
if self.data_format == 'channels_first':
    output_shape = (batch_size, self.filters, out_height, out_width)
else:
    output_shape = (batch_size, out_height, out_width, self.filters)

和( https://github.com /keras-team/keras/blob/master/keras/utils/conv_utils.py ):

def deconv_length(dim_size, stride_size, kernel_size, padding, output_padding, dilation=1):

    """Determines output length of a transposed convolution given input length.
    # Arguments
        dim_size: Integer, the input length.
        stride_size: Integer, the stride along the dimension of `dim_size`.
        kernel_size: Integer, the kernel size along the dimension of `dim_size`.
        padding: One of `"same"`, `"valid"`, `"full"`.
        output_padding: Integer, amount of padding along the output dimension, can be set to `None` in which case the output length is inferred.
        dilation: dilation rate, integer.
    # Returns
        The output length (integer).
    """

    assert padding in {'same', 'valid', 'full'}
    if dim_size is None:
        return None

    # Get the dilated kernel size
    kernel_size = kernel_size + (kernel_size - 1) * (dilation - 1)

    # Infer length if output padding is None, else compute the exact length
    if output_padding is None:
        if padding == 'valid':
            dim_size = dim_size * stride_size + max(kernel_size - stride_size, 0)
        elif padding == 'full':
            dim_size = dim_size * stride_size - (stride_size + kernel_size - 2)
        elif padding == 'same':
            dim_size = dim_size * stride_size
    else:
        if padding == 'same':
            pad = kernel_size // 2
        elif padding == 'valid':
            pad = 0
        elif padding == 'full':
            pad = kernel_size - 1

        dim_size = ((dim_size - 1) * stride_size + kernel_size - 2 * pad + output_padding)

    return dim_size

我了解Conv2DTranspose有点像Conv2D,但是相反.

I understand that Conv2DTranspose is kind of a Conv2D, but reversed.

由于将Conv2D的kernel_size =(3,3),步幅=(10,10)和padding ="same"应用于200x200图像,将输出20x20图像, 我假设将Conv2DTranspose的kernel_size =(3,3),步幅=(10,10)和padding ="same"应用于20x20图像将输出200x200图像.

Since applying a Conv2D with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 200x200 image will output a 20x20 image, I assume that applying a Conv2DTranspose with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 20x20 image will output a 200x200 image.

此外,将Conv2D的kernel_size =(3,3),步幅=(10,10)和padding ="same"应用于195x195图像也将输出20x20图像.

Also, applying a Conv2D with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 195x195 image will also output a 20x20 image.

因此,我知道在应用带有kernel_size =(3,3),步幅=(10,10)和padding ="same"的Conv2DTranspose时,输出形状上存在某种歧义(用户可能希望输出到是195x195或200x200或许多其他兼容的形状.

So, I understand that there is kind of an ambiguity on the output shape when applying a Conv2DTranspose with kernel_size = (3, 3), strides = (10, 10) and padding = "same" (user might want output to be 195x195, or 200x200, or many other compatible shapes).

我假设推断出输出形状".表示根据图层的参数计算出默认的输出形状,并且我假设有一种机制可以根据需要指定与默认形状不同的输​​出形状.

I assume that "the output shape is inferred." means that a default output shape is computed according to the parameters of the layer, and I assume that there is a mechanism to specify an output shape differnet from the default one, if necessary.

这说,我不太了解

  • "output_padding"参数的含义

  • the meaning of the "output_padding" parameter

参数"padding"和"output_padding"之间的相互作用

the interactions between parameters "padding" and "output_padding"

函数keras.conv_utils.deconv_length中的各种公式

the various formulas in the function keras.conv_utils.deconv_length

有人可以解释吗?

非常感谢,

朱利安

推荐答案

我可能已经找到了(部分)答案.

I may have found a (partial) answer.

我在Pytorch文档中找到了它,看上去比在这个主题上的Keras文档更清晰.

I found it in the Pytorch documentation, which appears to be much clearer than the Keras documentation on this topic.

将步长大于1的Conv2D应用于尺寸接近的图像时,我们将获得尺寸相同的输出图像.

When applying Conv2D with a stride greater than 1 to images which dimensions are close, we get output images with the same dimensions.

例如,当应用内核大小为3x3,步幅为7x7并填充相同"的Conv2D时,以下图像尺寸

For instance, when applied a Conv2D with kernel size of 3x3, stride of 7x7 and padding "same", the following image dimensions

22x22、23x23,...,28x28、22x28、28x22、27x24等(7x7 = 49 组合)

22x22, 23x23, ..., 28x28, 22x28, 28x22, 27x24, etc. (7x7 = 49 combinations)

ALL 将产生4x4的输出尺寸.

will ALL yield an output dimension of 4x4.

那是因为output_dimension = ceiling(input_dimension/stride).

That is because output_dimension = ceiling(input_dimension / stride).

因此,当应用内核大小为3x3,跨度为7x7且填充为相同"的Conv2DTranspose时,输出尺寸会模糊不清.

As a consequence, when applying a Conv2DTranspose with kernel size of 3x3, stride of 7x7 and padding "same", there is an ambiguity about the output dimension.

49个可能的输出尺寸中的任何一个都是正确的.

Any of the 49 possible output dimensions would be correct.

参数output_padding是一种通过明确选择输出尺寸来解决歧义的方法.

The parameter output_padding is a way to resolve the ambiguity by choosing explicitly the output dimension.

在我的示例中,最小输出大小为22x22,并且output_padding提供了要在输出图像底部添加的多行(0到6之间)和要在其添加的多列(0到6之间).输出图像的右侧.

In my example, the minimum output size is 22x22, and output_padding provides a number of lines (between 0 and 6) to add at the bottom of the output image and a number of columns (between 0 and 6) to add at the right of the output image.

因此,如果我使用outout_padding =(2,3),我可以得到output_dimensions = 24x25

So I can get output_dimensions = 24x25 if I use outout_padding = (2, 3)

但是,我仍然不了解的是,当未指定output_padding时(当它推断"输出形状时),keras用来选择特定输出图像尺寸的逻辑.

What I still do not understand, however, is the logic that keras uses to choose a certain output image dimension when output_padding is not specified (when it 'infers" the output shape)

一些指针:

https://pytorch.org/docs/stable/nn .html#torch.nn.ConvTranspose2d https://discuss.pytorch.org/t/the-output-size-of-convtranspose2d-differs-from-the-expected-output-size/1876/5 https://discuss.pytorch. org/t/question-about-the-output-padding-in-nn-convtrasnpose2d/19740 https://discuss.pytorch. org/t/what-does-output-padding-exactly-do-in-convtranspose2d/2688

所以回答我自己的问题:

So to answer my own questions:

  • "output_padding"参数的含义:请参见上文
  • 参数"padding"和"output_padding"之间的相互作用:这些参数是独立的
  • 函数keras.conv_utils.deconv_length中的各种公式
    • 目前,当output_padding为None时,我不了解该部分;
    • 我忽略了padding =='full'的情况(Conv2DTranspose不支持);
    • 填充=='有效'的公式似乎正确(可以通过反转Conv2D的公式来计算)
    • 如果kernel_size是偶数,则填充=='same'的公式对我来说似乎是不正确的. (事实上​​,当尝试使用input_dimension = 5x5,kernel_size = 2x2,stride = 7x7和padding ='same'来构建Conv2DTranspose层时,keras崩溃.在我看来,keras中存在一个错误,我将开始此主题的另一个主题...)
    • the meaning of the "output_padding" parameter: see above
    • the interactions between parameters "padding" and "output_padding": these parameters are independant
    • the various formulas in the function keras.conv_utils.deconv_length
      • For now, I do not understand the part when output_padding is None;
      • I ignore the case when padding == 'full' (not supported by Conv2DTranspose);
      • The formula for padding == 'valid' seems correct (can be computed by reversing the formula of Conv2D)
      • The formula for padding == 'same' seems incorrect to me, in case kernel_size is even. (As a matter of fact, keras crashes when trying to build a Conv2DTranspose layer with input_dimension = 5x5, kernel_size = 2x2, stride = 7x7 and padding = 'same'. It appears to me that there is a bug in keras, I will start another thread for this topic...)

      这篇关于了解keras Conv2DTranspose的输出形状的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆