优化的图像卷积算法 [英] Optimized image convolution algorithm

查看:87
本文介绍了优化的图像卷积算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力在C ++中实现图像卷积,并且我已经有了基于给定伪代码的幼稚工作代码:

I am working on implementing Image convolution in C++, and I already have a naive working code based on the given pseudo code:

for each image row in input image:
   for each pixel in image row:

      set accumulator to zero

      for each kernel row in kernel:
         for each element in kernel row:

            if element position  corresponding* to pixel position then
               multiply element value  corresponding* to pixel value
               add result to accumulator
            endif

      set output image pixel to accumulator

由于这可能是图像和内核较大的瓶颈,我想知道是否存在其他方法使事情更快?即使使用其他输入信息,例如:稀疏图像或内核,已知内核等...

As this can be a big bottleneck with big Images and Kernels, I was wondering if there exist some other approach to make things faster ? even with additionnal input info like : sparse image or kernel, already known kernel etc...

我知道这可以并行化,但是在我的情况下这是不可行的。 / p>

I know this can be parallelized, but it's not doable in my case.

推荐答案


if element position  corresponding* to pixel position then


我认为此测试的目的是避免乘以0。跳过考试!乘以0比由条件跳转引起的延迟要快得多。

I presume this test is meant to avoid a multiplication by 0. Skip the test! multiplying by 0 is way faster than the delays caused by a conditional jump.

另一种选择(并且总是发布实际代码而不是伪代码,这里让我猜测您实现的是什么!)是您正在测试越界访问。那也非常昂贵。最好中断循环,这样就不需要对大多数像素进行此测试:

The other alternative (and it's always better to post actual code rather than pseudo-code, here you have me guessing at what you implemented!) is that you're testing for out-of-bounds access. That is terribly expensive also. It is best to break up your loops so that you don't need to do this testing for the majority of the pixels:

for (row = 0; row < k/2; ++row) {
   // inner loop over kernel rows is adjusted so it only loops over part of the kernel
}
for (row = k/2; row < nrows-k/2; ++row) {
   // inner loop over kernel rows is unrestricted
}
for (row = nrows-k/2; row < nrows; ++row) {
   // inner loop over kernel rows is adjusted
}

当然,这同样适用于列上的循环,导致内核值上的内部循环重复9次。这很丑陋,但方法更快。

Of course, the same applies to loops over columns, leading to 9 repetitions of the inner loop over kernel values. It's ugly but way faster.

为避免代码重复,您可以创建更大的图像,将图像数据复制过来,并用零填充所有面。现在,循环不必担心访问越界,您的代码要简单得多。

To avoid the code repetition you can create a larger image, copy the image data over, padded with zeros on all sides. The loops now do not need to worry about accessing out-of-bounds, you have much simpler code.

接下来,特定类别的内核可以分解为一维内核 。例如,众所周知的Sobel内核是由[1,1,1]和[1,0,-1] T 的卷积产生的。对于3x3内核,这不是什么大问题,但是对于较大的内核,这是什么。通常,对于NxN内核,您从N 2 到2N个操作。

Next, a certain class of kernel can be decomposed into 1D kernels. For example, the well-known Sobel kernel results from the convolution of [1,1,1] and [1,0,-1]T. For a 3x3 kernel this is not a huge deal, but for larger kernels it is. In general, for a NxN kernel, you go from N2 to 2N operations.

尤其是高斯内核是可分离的。这是一个非常重要的平滑滤波器,也可以用于计算导数。

In particular, the Gaussian kernel is separable. This is a very important smoothing filter that can also be used for computing derivatives.

除了可以节省明显的计算成本外,这些一维卷积的代码也更加简单。对于一维滤波器,我们之前的9个重复代码块变为3个。

Besides the obvious computational cost saving, the code is also much simpler for these 1D convolutions. The 9 repeated blocks of code we had earlier become 3 for a 1D filter. The same code for the horizontal filter can be re-used for the vertical one.

最后,如< a href = https://stackoverflow.com/a/49062496/7328782> MBo的答案,您可以通过DFT计算卷积。可以使用O(MN log MN)中的 FFT 来计算DFT(对于尺寸为MxN)。这需要将内核填充到图像的大小,将两者都转换为傅立叶域,将它们相乘,然后对结果进行逆变换。总共3个转换。这是否比直接计算更有效取决于内核的大小以及它是否可分离。

Finally, as already mentioned in MBo's answer, you can compute the convolution through the DFT. The DFT can be computed using the FFT in O(MN log MN) (for an image of size MxN). This requires padding the kernel to the size of the image, transforming both to the Fourier domain, multiplying them together, and inverse-transforming the result. 3 transforms in total. Whether this is more efficient than the direct computation depends on the size of the kernel and whether it is separable or not.

这篇关于优化的图像卷积算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆