具有不同尺寸图像的 Tensorflow 卷积神经网络 [英] Tensorflow Convolution Neural Network with different sized images

查看：33 发布时间：2021/12/27 16:58:11 python tensorflow deep-learning conv-neural-network deconvolution

本文介绍了具有不同尺寸图像的 Tensorflow 卷积神经网络的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试创建一个深度 CNN，可以对图像中的每个像素进行分类.我正在从下面的图片中复制架构，该图片取自

目前，我对模型进行了硬编码以接受 32x32x7 大小的图像，但我想接受任何大小的输入.我需要对代码进行哪些更改才能接受可变大小的输入?

 x = tf.placeholder(tf.float32, shape=[None, 32*32*7])y_ = tf.placeholder(tf.float32, 形状=[无, 32*32*7, 3])...DeConnv1 = tf.nn.conv3d_transpose(layer1, filter = w, output_shape = [1,32,32,7,1], strides = [1,2,2,2,1], padding = 'SAME')...final = tf.reshape(final, [1, 32*32*7])W_final = weight_variable([32*32*7,32*32*7,3])b_final =bias_variable([32*32*7,3])final_conv = tf.tensordot(final, W_final, axes=[[1], [1]]) + b_final

解决方案

动态占位符

Tensorflow 允许在占位符中有多个动态(也称为 None)维度.在构建图形时，引擎将无法确保正确性，因此客户端负责提供正确的输入，但它提供了很大的灵活性.

所以我要从...

x = tf.placeholder(tf.float32, shape=[None, N*M*P])y_ = tf.placeholder(tf.float32, shape=[None, N*M*P, 3])...x_image = tf.reshape(x, [-1, N, M, P, 1])

到...

# 几乎所有维度都是动态的x_image = tf.placeholder(tf.float32, shape=[None, None, None, None, 1])标签 = tf.placeholder(tf.float32, shape=[None, None, 3])

既然您打算将输入重塑为 5D，那么为什么不从一开始就在 x_image 中使用 5D.此时，label 的第二维是任意的，但我们承诺 tensorflow 它将与 x_image 匹配.

反卷积中的动态形状

接下来，tf.nn.conv3d_transposecode> 是它的输出形状可以是动态的.所以，而不是这个:

# 硬编码输出形状DeConnv1 = tf.nn.conv3d_transpose(layer1, w, output_shape=[1,32,32,7,1], ...)

...你可以这样做:

# 动态输出形状DeConnv1 = tf.nn.conv3d_transpose(layer1, w, output_shape=tf.shape(x_image), ...)

这种转置卷积可以应用于任何图像，结果将采用在运行时实际传入的x_image的形状.

注意 x_image 的静态形状是 (?, ?, ?, ?, 1).

全卷积网络

最后也是最重要的一块拼图是使整个网络卷积，这也包括你最后的密集层.密集层必须静态定义其维度，这迫使整个神经网络固定输入图像维度.

幸运的是，Springenberg 等人在 "Striving for Simplicity 中描述了一种用 CONV 层替换 FC 层的方法:全卷积网络"论文.我将使用带有 3 个 1x1x1 过滤器的卷积(另请参阅这个问题):

final_conv = conv3d_s1(final, weight_variable([1, 1, 1, 1, 3]))y = tf.reshape(final_conv, [-1, 3])

如果我们确保 final 具有与 DeConnv1(和其他)相同的尺寸，它会使 y 成为我们想要的形状: [-1, N * M * P, 3].

综合起来

您的网络非常大，但所有反卷积基本上都遵循相同的模式，因此我将我的概念验证代码简化为一个反卷积.目标只是展示什么样的网络能够处理任意大小的图像.最后备注:图像尺寸可以在批次之间变化，但在一个批次内它们必须相同.

完整代码:

sess = tf.InteractiveSession()def conv3d_dilation(tempX, tempFilter):返回 tf.layers.conv3d(tempX, filters=tempFilter, kernel_size=[3, 3, 1], strides=1, padding='SAME', dilation_rate=2)def conv3d(tempX, tempW):返回 tf.nn.conv3d(tempX, tempW, strides=[1, 2, 2, 2, 1], padding='SAME')def conv3d_s1(tempX, tempW):返回 tf.nn.conv3d(tempX, tempW, strides=[1, 1, 1, 1, 1], padding='SAME')def weight_variable(形状):初始 = tf.truncated_normal(形状，stddev=0.1)返回 tf.Variable(初始)def bias_variable(形状):初始 = tf.constant(0.1，形状=形状)返回 tf.Variable(初始)def max_pool_3x3(x):返回 tf.nn.max_pool3d(x, ksize=[1, 3, 3, 3, 1], strides=[1, 2, 2, 2, 1], padding='SAME')x_image = tf.placeholder(tf.float32, shape=[None, None, None, None, 1])标签 = tf.placeholder(tf.float32, shape=[None, None, 3])W_conv1 = weight_variable([3, 3, 1, 1, 32])h_conv1 = conv3d(x_image, W_conv1)# 第二次卷积W_conv2 = weight_variable([3, 3, 4, 32, 64])h_conv2 = conv3d_s1(h_conv1, W_conv2)# 第三个卷积路径 1W_conv3_A = weight_variable([1, 1, 1, 64, 64])h_conv3_A = conv3d_s1(h_conv2, W_conv3_A)# 第三个卷积路径 2W_conv3_B = weight_variable([1, 1, 1, 64, 64])h_conv3_B = conv3d_s1(h_conv2, W_conv3_B)# 第四个卷积路径 1W_conv4_A = weight_variable([3, 3, 1, 64, 96])h_conv4_A = conv3d_s1(h_conv3_A, W_conv4_A)# 第四个卷积路径2W_conv4_B = weight_variable([1, 7, 1, 64, 64])h_conv4_B = conv3d_s1(h_conv3_B, W_conv4_B)#第五个卷积路径2W_conv5_B = weight_variable([1, 7, 1, 64, 64])h_conv5_B = conv3d_s1(h_conv4_B, W_conv5_B)#第六个卷积路径2W_conv6_B = weight_variable([3, 3, 1, 64, 96])h_conv6_B = conv3d_s1(h_conv5_B, W_conv6_B)# 串联layer1 = tf.concat([h_conv4_A, h_conv6_B], 4)w = tf.Variable(tf.constant(1., shape=[2, 2, 4, 1, 192]))DeConnv1 = tf.nn.conv3d_transpose(layer1, filter=w, output_shape=tf.shape(x_image), strides=[1, 2, 2, 2, 1], padding='SAME')最终 = DeConnv1final_conv = conv3d_s1(final, weight_variable([1, 1, 1, 1, 3]))y = tf.reshape(final_conv, [-1, 3])cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=label, logits=y))打印('x_image:'，x_image)打印('DeConnv1:'，DeConnv1)打印('final_conv:'，final_conv)def try_image(N, M, P, B=1):batch_x = np.random.normal(size=[B, N, M, P, 1])batch_y = np.ones([B, N * M * P, 3])/3.0deconv_val, final_conv_val, loss = sess.run([DeConnv1, final_conv, cross_entropy],feed_dict={x_image:batch_x，标签:batch_y})打印(deconv_val.shape)打印(final_conv.shape)打印(损失)打印()tf.global_variables_initializer().run()try_image(32, 32, 7)try_image(16, 16, 3)try_image(16, 16, 3, 2)

I am attempting to create a deep CNN that can classify each individual pixel in an image. I am replicating architecture from the image below taken from this paper. In the paper it is mentioned that deconvolutions are used so that any size of input is possible. This can be seen in the image below.

Github Repository

Currently, I have hard coded my model to accept images of size 32x32x7, but I would like to accept any size of input. What changes would I need to make to my code to accept variable sized input?

 x = tf.placeholder(tf.float32, shape=[None, 32*32*7])
 y_ = tf.placeholder(tf.float32, shape=[None, 32*32*7, 3])
 ...
 DeConnv1 = tf.nn.conv3d_transpose(layer1, filter = w, output_shape = [1,32,32,7,1], strides = [1,2,2,2,1], padding = 'SAME')
 ...
 final = tf.reshape(final, [1, 32*32*7])
 W_final = weight_variable([32*32*7,32*32*7,3])
 b_final = bias_variable([32*32*7,3])
 final_conv = tf.tensordot(final, W_final, axes=[[1], [1]]) + b_final

解决方案

Dynamic placeholders

Tensorflow allows to have multiple dynamic (a.k.a. None) dimensions in placeholders. The engine won't be able to ensure correctness while the graph is built, hence the client is responsible for feeding the correct input, but it provides a lot of flexibility.

So I'm going from...

x = tf.placeholder(tf.float32, shape=[None, N*M*P])
y_ = tf.placeholder(tf.float32, shape=[None, N*M*P, 3])
...
x_image = tf.reshape(x, [-1, N, M, P, 1])

to...

# Nearly all dimensions are dynamic
x_image = tf.placeholder(tf.float32, shape=[None, None, None, None, 1])
label = tf.placeholder(tf.float32, shape=[None, None, 3])

Since you intend to reshape the input to 5D anyway, so why don't use 5D in x_image right from the start. At this point, the second dimension of label is arbitrary, but we promise tensorflow that it will match with x_image.

Dynamic shapes in deconvolution

Next, the nice thing about tf.nn.conv3d_transpose is that its output shape can be dynamic. So instead of this:

# Hard-coded output shape
DeConnv1 = tf.nn.conv3d_transpose(layer1, w, output_shape=[1,32,32,7,1], ...)

... you can do this:

# Dynamic output shape
DeConnv1 = tf.nn.conv3d_transpose(layer1, w, output_shape=tf.shape(x_image), ...)

This way the transpose convolution can be applied to any image and the result will take the shape of x_image that was actually passed in at runtime.

Note that static shape of x_image is (?, ?, ?, ?, 1).

All-Convolutional network

Final and most important piece of the puzzle is to make the whole network convolutional, and that includes your final dense layer too. Dense layer must define its dimensions statically, which forces the whole neural network fix input image dimensions.

Luckily for us, Springenberg at al describe a way to replace an FC layer with a CONV layer in "Striving for Simplicity: The All Convolutional Net" paper. I'm going to use a convolution with 3 1x1x1 filters (see also this question):

final_conv = conv3d_s1(final, weight_variable([1, 1, 1, 1, 3]))
y = tf.reshape(final_conv, [-1, 3])

If we ensure that final has the same dimensions as DeConnv1 (and others), it'll make y right the shape we want: [-1, N * M * P, 3].

Combining it all together

Your network is pretty large, but all deconvolutions basically follow the same pattern, so I've simplified my proof-of-concept code to just one deconvolution. The goal is just to show what kind of network is able to handle images of arbitrary size. Final remark: image dimensions can vary between batches, but within one batch they have to be the same.

The full code:

sess = tf.InteractiveSession()

def conv3d_dilation(tempX, tempFilter):
  return tf.layers.conv3d(tempX, filters=tempFilter, kernel_size=[3, 3, 1], strides=1, padding='SAME', dilation_rate=2)

def conv3d(tempX, tempW):
  return tf.nn.conv3d(tempX, tempW, strides=[1, 2, 2, 2, 1], padding='SAME')

def conv3d_s1(tempX, tempW):
  return tf.nn.conv3d(tempX, tempW, strides=[1, 1, 1, 1, 1], padding='SAME')

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def max_pool_3x3(x):
  return tf.nn.max_pool3d(x, ksize=[1, 3, 3, 3, 1], strides=[1, 2, 2, 2, 1], padding='SAME')

x_image = tf.placeholder(tf.float32, shape=[None, None, None, None, 1])
label = tf.placeholder(tf.float32, shape=[None, None, 3])

W_conv1 = weight_variable([3, 3, 1, 1, 32])
h_conv1 = conv3d(x_image, W_conv1)
# second convolution
W_conv2 = weight_variable([3, 3, 4, 32, 64])
h_conv2 = conv3d_s1(h_conv1, W_conv2)
# third convolution path 1
W_conv3_A = weight_variable([1, 1, 1, 64, 64])
h_conv3_A = conv3d_s1(h_conv2, W_conv3_A)
# third convolution path 2
W_conv3_B = weight_variable([1, 1, 1, 64, 64])
h_conv3_B = conv3d_s1(h_conv2, W_conv3_B)
# fourth convolution path 1
W_conv4_A = weight_variable([3, 3, 1, 64, 96])
h_conv4_A = conv3d_s1(h_conv3_A, W_conv4_A)
# fourth convolution path 2
W_conv4_B = weight_variable([1, 7, 1, 64, 64])
h_conv4_B = conv3d_s1(h_conv3_B, W_conv4_B)
# fifth convolution path 2
W_conv5_B = weight_variable([1, 7, 1, 64, 64])
h_conv5_B = conv3d_s1(h_conv4_B, W_conv5_B)
# sixth convolution path 2
W_conv6_B = weight_variable([3, 3, 1, 64, 96])
h_conv6_B = conv3d_s1(h_conv5_B, W_conv6_B)
# concatenation
layer1 = tf.concat([h_conv4_A, h_conv6_B], 4)
w = tf.Variable(tf.constant(1., shape=[2, 2, 4, 1, 192]))
DeConnv1 = tf.nn.conv3d_transpose(layer1, filter=w, output_shape=tf.shape(x_image), strides=[1, 2, 2, 2, 1], padding='SAME')

final = DeConnv1
final_conv = conv3d_s1(final, weight_variable([1, 1, 1, 1, 3]))
y = tf.reshape(final_conv, [-1, 3])
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=label, logits=y))

print('x_image:', x_image)
print('DeConnv1:', DeConnv1)
print('final_conv:', final_conv)

def try_image(N, M, P, B=1):
  batch_x = np.random.normal(size=[B, N, M, P, 1])
  batch_y = np.ones([B, N * M * P, 3]) / 3.0

  deconv_val, final_conv_val, loss = sess.run([DeConnv1, final_conv, cross_entropy],
                                              feed_dict={x_image: batch_x, label: batch_y})
  print(deconv_val.shape)
  print(final_conv.shape)
  print(loss)
  print()

tf.global_variables_initializer().run()
try_image(32, 32, 7)
try_image(16, 16, 3)
try_image(16, 16, 3, 2)

这篇关于具有不同尺寸图像的 Tensorflow 卷积神经网络的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

具有不同尺寸图像的 Tensorflow 卷积神经网络 [英] Tensorflow Convolution Neural Network with different sized images

问题描述

动态占位符

反卷积中的动态形状

全卷积网络

综合起来

Dynamic placeholders

Dynamic shapes in deconvolution

All-Convolutional network

Combining it all together

相关文章

Python最新文章

热门教程

热门工具

登录关闭

具有不同尺寸图像的 Tensorflow 卷积神经网络 [英] Tensorflow Convolution Neural Network with different sized images

问题描述

动态占位符

反卷积中的动态形状

全卷积网络

综合起来

Dynamic placeholders

Dynamic shapes in deconvolution

All-Convolutional network

Combining it all together

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭