计算卷积的最快方法 [英] Fastest method to compute convolution

查看:629
本文介绍了计算卷积的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须在许多图像的每一行上应用卷积滤镜。经典是1024x1024像素的360度图像。在我的用例中,它是720像素560x600像素。

I have to apply a convolution filter on each row of many images. The classic is 360 images of 1024x1024 pixels. In my use case it is 720 images 560x600 pixels.

问题是我的代码比文章中广告的要慢得多。

The problem is that my code is much slower than what is advertised in articles.

我实施了天真的卷积,需要2个30秒。然后我使用fftw切换到FFT。我使用了复数2复数,在每个变换中过滤两行。我现在20多岁了。

I have implemented the naive convolution, and it takes 2m 30s. I then switched to FFT using fftw. I used complex 2 complex, filtering two rows in each transform. I'm now around 20s.

事情是文章广告大约10s甚至更少的经典条件。
所以我想问一下这里的专家是否有更快的方法来计算卷积。

The thing is that articles advertise around 10s and even less for the classic condition. So I'd like to ask the experts here if there could be a faster way to compute the convolution.

数字配方建议避免在dft中进行排序并相应地调整频域滤波器功能。但是没有代码示例如何做到这一点。

Numerical recipes suggest to avoid the sorting done in the dft and adapt the frequency domain filter function accordingly. But there is no code example how this could be done.

也许我在复制数据时浪费时间。使用真正的2实数变换,我不必将数据复制到complexe值中。但无论如何我必须填0。

Maybe I lose time in copying data. With real 2 real transform I wouldn't have to copy the data into the complexe values. But I have to pad with 0 anyway.

编辑:请参阅下面我自己的答案,了解进度反馈和解决此问题的更多信息。

see my own answer below for progress feedback and further information on solving this issue.

问题(精确重构):

我正在寻找一种算法或一段代码来将非常快速的卷积应用于离散的非周期函数(512到2048个值)。显然离散时间傅里叶变换是要走的路。虽然,我想避免数据复制和转换为复杂,并避免蝴蝶重新排序。

I'm looking for an algorithm or piece of code to apply a very fast convolution to a discrete non periodic function (512 to 2048 values). Apparently the discrete time Fourier transform is the way to go. Though, I'd like to avoid data copy and conversion to complex, and avoid the butterfly reordering.

推荐答案

FFT是最快的已知用于卷积信号的技术,而FFTW是可用于计算FFT的最快的免费库。

FFT is the fastest technique known for convolving signals, and FFTW is the fastest free library available for computing the FFT.

获得最大性能的关键(在硬件之外...... GPU是一个很好的建议)将你的信号填充为2的幂。使用FFTW时,在创建计划时使用患者设置以获得最佳性能。你不可能手动推出比FFTW更快的实现(忘记N.R.)。另外一定要使用前向1D FFT的Real版本,而不是复杂版本;如果可以,只使用单一(浮点)精度。

The key for you to get maximum performance (outside of hardware ... the GPU is a good suggestion) will be to pad your signals to a power of two. When using FFTW use the 'patient' setting when creating your plan to get the best performance. It's highly unlikely that you will hand-roll a faster implementation than what FFTW provides (forget about N.R.). Also be sure to be using the Real version of the forward 1D FFT and not the Complex version; and only use single (floating point) precision if you can.

如果FFTW没有为你削减它,那么我会看看英特尔(非常实惠的)IPP库。已针对英特尔处理器调整FFT,针对具有不同位深度的图像进行了优化。

If FFTW is not cutting it for you, then I would look at Intel's (very affordable) IPP library. The have hand tuned FFT's for Intel processors that have been optimized for images with various bit depths.

Paul

CenterSpace 软件

这篇关于计算卷积的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆