tf.nn.conv2d()对输入张量形状有什么影响? [英] what is the effect of tf.nn.conv2d() on an input tensor shape?

查看:214
本文介绍了tf.nn.conv2d()对输入张量形状有什么影响?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在专门研究Dandelion Mane的张量板代码:

I am studying tensorboard code from Dandelion Mane specificially: https://github.com/dandelionmane/tf-dev-summit-tensorboard-tutorial/blob/master/mnist.py

他的卷积层具体定义为:

His convolution layer is specifically defined as:

def conv_layer(input, size_in, size_out, name="conv"):
  with tf.name_scope(name):
    w = tf.Variable(tf.truncated_normal([5, 5, size_in, size_out], stddev=0.1), name="W")
    b = tf.Variable(tf.constant(0.1, shape=[size_out]), name="B")
    conv = tf.nn.conv2d(input, w, strides=[1, 1, 1, 1], padding="SAME")
    act = tf.nn.relu(conv + b)
    tf.summary.histogram("weights", w)
    tf.summary.histogram("biases", b)
    tf.summary.histogram("activations", act)
    return tf.nn.max_pool(act, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

我正在尝试计算 conv2d 对输入张量大小的影响.据我所知,前三个维度似乎没有变化,但输出的最后一个维度遵循的是 w 的最后一个维度的大小.

I am trying to work out what is the effect of the conv2d on the input tensor size. As far as I can tell it seems the first 3 dimensions are unchanged but the last dimension of the output follows the size of the last dimension of w.

例如,?x47x36x64输入变为w形= 5x5x64x128的?x47x36x128

For example, ?x47x36x64 input becomes ?x47x36x128 with w shape=5x5x64x128

我还看到:?x24x18x128变成?w24 = 5x5x128x256的?x24x18x256

And I also see that: ?x24x18x128 becomes ?x24x18x256 with w shape=5x5x128x256

因此,输入的结果大小是: [a,b,c,d] 的输出大小是 [a,b,c,w.shape [3]] ?

So, is the resultant size for input: [a,b,c,d] the output size of [a,b,c,w.shape[3]]?

认为第一维度没有变化是正确的吗?

Would it be correct to think that the first dimension does not change?

推荐答案

由于使用的跨度和所应用的填充,这种方法在您的情况下有效.输出的宽度和高度并不总是与输入的相同.

This works in your case because of the stride used and the padding applied. The output width and height will not always be the same as the input.

查看对该主题的出色讨论.基本的要点(几乎是从该链接中逐字记录的)是卷积层:

Check out this excellent discussion of the topic. The basic takeaway (taken almost verbatim from that link) is that a convolution layer:

  • 接受大小为 W1 x H1 x D1
  • 的输入量
  • 需要四个超参数:
    • 过滤器数量 K
    • 过滤器的空间范围 F
    • 过滤器移动的步幅 S
    • 零填充的数量 P
    • W2 =(W1-F + 2 * P)/S +1
    • H2 =(H1-F + 2 * P)/S +1
    • D2 = K

    当您在Tensorflow中处理一批数据时,它们通常具有 [batch_size,宽度,高度,深度] 的形状,因此第一个维度(即批次中的样本数)不应改变.

    And when you are processing batches of data in Tensorflow they typically have shape [batch_size, width, height, depth], so the first dimension which is just the number of samples in your batch should not change.

    请注意,上面的填充 P 的数量对于TF来说有点棘手.当您将 padding ='same'参数赋予 tf.nn.conv2d 时,tensorflow将零填充应用于图像的两面,以确保图像中没有像素会被您的过滤器忽略,但可能不会在两侧都添加相同数量的填充(我认为只能相差一个).该SO线程对该主题进行了很好的讨论.

    Note that the amount of padding P in the above is a little tricky with TF. When you give the padding='same' argument to tf.nn.conv2d, tensorflow applies zero padding to both sides of the image to make sure that no pixels of the image are ignored by your filter, but it may not add the same amount of padding to both sides (can differ by only one I think). This SO thread has some good discussion on the topic.

    通常,步幅 S 为1(您的网络具有), P =(F-1)/2 的零填充将确保输出宽度/高度等于输入,即 W2 = W1 H2 = H1 .在您的情况下, F 为5,因此对于 P 为, tf.nn.conv2d 必须在图像的每一侧添加两个零.2,根据上述等式,您的输出宽度为 W2 =(W1-5 + 2 * 2)/1 + 1 = W1-1 + 1 = W1 .

    In general, with a stride S of 1 (which your network has), zero padding of P = (F - 1) / 2 will ensure that the output width/height equals the input, i.e. W2 = W1 and H2 = H1. In your case, F is 5, so tf.nn.conv2d must be adding two zeros to each side of the image for a P of 2, and your output width according to the above equation is W2 = (W1 - 5 + 2*2)/1 + 1 = W1 - 1 + 1 = W1.

    这篇关于tf.nn.conv2d()对输入张量形状有什么影响?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆