tf.nn.conv2d 在张量流中做什么? [英] What does tf.nn.conv2d do in tensorflow?

查看:25
本文介绍了tf.nn.conv2d 在张量流中做什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看有关 tf.nn.conv2d

当您使用内核时,您将收到以下输出:,计算方式如下:

  • 14 = 4 * 1 + 3 * 0 + 1 * 1 + 2 * 2 + 1 * 1 + 0 * 0 + 1 * 0 + 2 * 0 + 4 * 1
  • 6 = 3 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 0 * 1 + 1 * 0 + 2 * 0 + 4 * 0 + 1 * 1
  • 6 = 2 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 2 * 1 + 4 * 0 + 3 * 0 + 1 * 0 + 0 * 1
  • 12 = 1 * 1 + 0 * 0 + 1 * 1 + 2 * 2 + 4 * 1 + 1 * 0 + 1 * 0 + 0 * 0 + 2 * 1

TF 的 conv2d 函数批量计算卷积,使用稍微不同的格式.对于输入,它是 [batch, in_height, in_width, in_channels] 对于内核,它是 [filter_height, filter_width, in_channels, out_channels].所以我们需要以正确的格式提供数据:

 将 tensorflow 导入为 tfk = tf.constant([[1, 0, 1],[2, 1, 0],[0, 0, 1]], dtype=tf.float32, name='k')我 = tf.constant([[4, 3, 1, 0],[2, 1, 0, 1],[1, 2, 4, 1],[3, 1, 0, 2]], dtype=tf.float32, name='i')内核 = tf.reshape(k, [3, 3, 1, 1], name='内核')image = tf.reshape(i, [1, 4, 4, 1], name='image')

之后卷积计算如下:

res = tf.squeeze(tf.nn.conv2d(image, kernel, [1, 1, 1, 1], "VALID"))# VALID 表示没有填充使用 tf.Session() 作为 sess:打印 sess.run(res)

并且将相当于我们手工计算的那个.

<小时>

对于 使用填充/步幅的示例,请看这里.

I was looking at the docs of tensorflow about tf.nn.conv2d here. But I can't understand what it does or what it is trying to achieve. It says on the docs,

#1 : Flattens the filter to a 2-D matrix with shape

[filter_height * filter_width * in_channels, output_channels].

Now what does that do? Is that element-wise multiplication or just plain matrix multiplication? I also could not understand the other two points mentioned in the docs. I have written them below :

# 2: Extracts image patches from the the input tensor to form a virtual tensor of shape

[batch, out_height, out_width, filter_height * filter_width * in_channels].

# 3: For each patch, right-multiplies the filter matrix and the image patch vector.

It would be really helpful if anyone could give an example, a piece of code (extremely helpful) maybe and explain what is going on there and why the operation is like this.

I've tried coding a small portion and printing out the shape of the operation. Still, I can't understand.

I tried something like this:

op = tf.shape(tf.nn.conv2d(tf.random_normal([1,10,10,10]), 
              tf.random_normal([2,10,10,10]), 
              strides=[1, 2, 2, 1], padding='SAME'))

with tf.Session() as sess:
    result = sess.run(op)
    print(result)

I understand bits and pieces of convolutional neural networks. I studied them here. But the implementation on tensorflow is not what I expected. So it raised the question.

EDIT: So, I implemented a much simpler code. But I can't figure out what's going on. I mean how the results are like this. It would be extremely helpful if anyone could tell me what process yields this output.

input = tf.Variable(tf.random_normal([1,2,2,1]))
filter = tf.Variable(tf.random_normal([1,1,1,1]))

op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
init = tf.initialize_all_variables()
with tf.Session() as sess:
    sess.run(init)

    print("input")
    print(input.eval())
    print("filter")
    print(filter.eval())
    print("result")
    result = sess.run(op)
    print(result)

output

input
[[[[ 1.60314465]
   [-0.55022103]]

  [[ 0.00595062]
   [-0.69889867]]]]
filter
[[[[-0.59594476]]]]
result
[[[[-0.95538563]
   [ 0.32790133]]

  [[-0.00354624]
   [ 0.41650501]]]]

解决方案

2D convolution is computed in a similar way one would calculate 1D convolution: you slide your kernel over the input, calculate the element-wise multiplications and sum them up. But instead of your kernel/input being an array, here they are matrices.


In the most basic example there is no padding and stride=1. Let's assume your input and kernel are:

When you use your kernel you will receive the following output: , which is calculated in the following way:

  • 14 = 4 * 1 + 3 * 0 + 1 * 1 + 2 * 2 + 1 * 1 + 0 * 0 + 1 * 0 + 2 * 0 + 4 * 1
  • 6 = 3 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 0 * 1 + 1 * 0 + 2 * 0 + 4 * 0 + 1 * 1
  • 6 = 2 * 1 + 1 * 0 + 0 * 1 + 1 * 2 + 2 * 1 + 4 * 0 + 3 * 0 + 1 * 0 + 0 * 1
  • 12 = 1 * 1 + 0 * 0 + 1 * 1 + 2 * 2 + 4 * 1 + 1 * 0 + 1 * 0 + 0 * 0 + 2 * 1

TF's conv2d function calculates convolutions in batches and uses a slightly different format. For an input it is [batch, in_height, in_width, in_channels] for the kernel it is [filter_height, filter_width, in_channels, out_channels]. So we need to provide the data in the correct format:

import tensorflow as tf
k = tf.constant([
    [1, 0, 1],
    [2, 1, 0],
    [0, 0, 1]
], dtype=tf.float32, name='k')
i = tf.constant([
    [4, 3, 1, 0],
    [2, 1, 0, 1],
    [1, 2, 4, 1],
    [3, 1, 0, 2]
], dtype=tf.float32, name='i')
kernel = tf.reshape(k, [3, 3, 1, 1], name='kernel')
image  = tf.reshape(i, [1, 4, 4, 1], name='image')

Afterwards the convolution is computed with:

res = tf.squeeze(tf.nn.conv2d(image, kernel, [1, 1, 1, 1], "VALID"))
# VALID means no padding
with tf.Session() as sess:
   print sess.run(res)

And will be equivalent to the one we calculated by hand.


For examples with padding/strides, take a look here.

这篇关于tf.nn.conv2d 在张量流中做什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆