带大型过滤器的TensorFlow Conv2d的内存使用情况 [英] Memory usage of tensorflow conv2d with large filters

查看:164
本文介绍了带大型过滤器的TensorFlow Conv2d的内存使用情况的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个张量流模型,其中有一些相对较大的 135 x 135 x 1 x 3 卷积滤波器。我发现 tf.nn.conv2d 对于如此大的过滤器变得无法使用-它试图使用超过60GB的内存,这时我需要将其杀死。这是重现我的错误的最小脚本:

I have a tensorflow model with some relatively large 135 x 135 x 1 x 3 convolution filters. I find that tf.nn.conv2d becomes unusable for such large filters - it attempts to use well over 60GB of memory, at which point I need to kill it. Here is the minimum script to reproduce my error:

import tensorflow as tf
import numpy as np

frames, height, width, channels = 200, 321, 481, 1
filter_h, filter_w, filter_out = 5, 5, 3  # With this, output has shape (200, 317, 477, 3)
# filter_h, filter_w, filter_out = 7, 7, 3  # With this, output has shape (200, 315, 475, 3)
# filter_h, filter_w, filter_out = 135, 135, 3  # With this, output will be smaller than the above with shape (200, 187, 347, 3), but memory usage explodes

images = np.random.randn(frames, height, width, channels).astype(np.float32)

filters = tf.Variable(np.random.randn(filter_h, filter_w, channels, filter_out).astype(np.float32))
images_input = tf.placeholder(tf.float32)
conv = tf.nn.conv2d(images_input, filters, strides=[1, 1, 1, 1], padding="VALID")

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    result = sess.run(conv, feed_dict={images_input: images})

print result.shape

首先,有人可以解释这种现象吗?为什么内存使用量随过滤器大小而增加? (注意:我还尝试更改尺寸,以使用单个 conv3d 而不是一批 conv2d s,但是

First, can anyone explain this behavior? Why does memory usage blow up with filter size? (Note: I also tried changing my dimensions around to use a single conv3d instead of a batch of conv2ds, but this had the same problem)

第二,除了将操作分解为200个单独的单图像卷积之外,谁能提出解决方案?

Second, can anyone suggest a solution other than, say, breaking the operation up into 200 separate single-image convolutions?

编辑::重新阅读文档位于 tf.nn.conv2d()上,我在解释其工作原理时注意到了这一点:

After re-reading the docs on tf.nn.conv2d(), I noticed this in the explanation of how it works:



  1. 将过滤器展开为形状为的二维矩阵[filter_height * filter_width * in_channels,output_channels]

  2. 从输入张量中提取图像补丁以形成形状为的虚拟张量[batch,out_height,out_width,filter_height * filter_width * in_channels ]

  3. 对于每个色块,将滤镜矩阵和图像色块vect右乘。或。


我本来只是将其作为对的描述这个过程,但是如果张量流实际上是从引擎盖下的图像中提取并存储单独的过滤器大小的补丁,那么包络计算表明,在我的情况下,涉及的中间计算需要约130GB,我可以测试的极限。这个可能回答了我的第一个问题,但是如果可以的话,有人可以解释为什么当我仍然仅在CPU上调试时TF会这样做吗?

I had originally taken this simply as a description of the process, but if tensorflow is actually extracting and storing separate filter-sized 'patches' from the image under the hood, then a back-of-the-envelope calculation shows that the intermediate computation involved requires ~130GB in my case, well over the limit that I could test.. This might answer my first question, but if so can anyone explain why TF would do this when I'm still only debugging on a CPU?

推荐答案


我本来只是将它作为对过程的描述,但是如果张量流实际上是在提取和从引擎盖下的图像中存储单独的
过滤器大小的补丁,然后
的包络计算表明,在我的案例中,涉及的中间
计算需要约130GB,远远超出了我可以测试的限制

I had originally taken this simply as a description of the process, but if tensorflow is actually extracting and storing separate filter-sized 'patches' from the image under the hood, then a back-of-the-envelope calculation shows that the intermediate computation involved requires ~130GB in my case, well over the limit that I could test.

如您所知,这就是消耗大量内存的原因。 Tensorflow这样做是因为滤波器通常很小,并且计算矩阵乘法比计算卷积要快得多。

As you figured out yourself, this is the reason for the large memory consumption. Tensorflow does this because the filters are usually small and calculating a matrix multiplication is a lot faster than calculating a convolution.


有人能解释为什么当我仍然只在CPU上调试
时TF会这样做吗?

can anyone explain why TF would do this when I'm still only debugging on a CPU?

您也可以在没有GPU的情况下使用tensorflow,因此CPU实现不仅仅用于调试。它们还针对速度进行了优化,并且CPU和GPU上的矩阵乘法都更快。

You can also use tensorflow without having a GPU, therefore the CPU implementations are not just there for debugging. They are also optimized for speed and matrix multiplication is faster on both CPU and GPU.

要使使用大型过滤器的卷积成为可能,您必须在大型过滤器中实现卷积C ++,并将其添加为张量流的新操作。

To make convolutions with large filters possible you would have to implement a convolution for large filters in C++ and add it as a new op to tensorflow.

这篇关于带大型过滤器的TensorFlow Conv2d的内存使用情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆