用于计算卷积的最快方法 [英] Fastest method for calculating convolution

查看:1216
本文介绍了用于计算卷积的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都知道计算卷积的最快方法吗?不幸的是,我处理的矩阵非常大(500x500x200),如果我在MATLAB中使用 convn 需要很长时间(我必须在嵌套循环中迭代这个计算) 。所以,我使用卷积FFT和它现在更快。但是,我仍然在寻找一种更快的方法。任何想法?

解决方案

如果你的内核是可分的,最大的速度增益将通过执行多个连续的1D卷积实现。 p>

MathWorks的Steve Eddins介绍了如何利用卷积的相关性来加速卷积,当内核在MATLAB上下文中可以在此博客。对于 P-by-Q 内核,执行两个单独和顺序卷积与2D卷积的计算优点是 PQ /(P + Q) ,对应于9x9内核的4.5x和15x15内核的11x。 编辑:在此问答中,提供了一个有趣的, / a>。



要确定内核是否可分离(即两个向量的外积),博客继续描述如何检查您的内核是否可以与SVD分离,以及如何获取1D内核。他们的例子是一个2D内核。对于N维可分离卷积的解决方案,请检查此FEX提交。 / p>




另一个值得指出的资源是演示。该代码用于16位整数。除非您移动到GPU(例如 cuFFT ),否则很可能难以实现比Intel的实现更快,其中还包括 Intel MKL 。在 MKL文档的此页底部有一个3D卷积(单精度浮点)示例(链接固定,现在已在 http://stackoverflow.com/a/27074295/2778484 中进行了镜像)。


Anybody know about the fastest method for calculating convolution? Unfortunately the matrix which I deal with is very large (500x500x200) and if I use convn in MATLAB it takes a long time (I have to iterate this calculation in a nested loop). So, I used convolution with FFT and it is faster now. But, I am still looking for a faster method. Any idea?

解决方案

If your kernel is separable, the greatest speed gains will be realized by performing multiple sequential 1D convolutions.

Steve Eddins of MathWorks describes how to take advantage of the associativity of convolution to speed up convolution when the kernel is separable in a MATLAB context on his blog. For a P-by-Q kernel, the computational advantage of performing two separate and sequential convolutions vs. 2D convolution is PQ/(P+Q), which corresponds to 4.5x for a 9x9 kernel and ~11x for a 15x15 kernel. EDIT: An interesting unwitting demonstration of this difference was given in this Q&A.

To figure out if the kernel is separable (i.e. the outer product of two vectors) the blog goes on to describe how to check if your kernel is separable with SVD and how to get the 1D kernels. Their example is for a 2D kernel. For a solution for N-dimensional separable convolution, check this FEX submission.


Another resource worth pointing out is this SIMD (SSE3/SSE4) implementation of 3D convolution by Intel, which includes both source and a presentation. The code is for 16 bit integers. Unless you move to GPU (e.g. cuFFT), it is probably hard to get faster than Intel's implementations, which also includes Intel MKL. There is an example of 3D convolution (single-precision float) at the bottom of this page of the MKL documentation (link fixed, now mirrored in http://stackoverflow.com/a/27074295/2778484).

这篇关于用于计算卷积的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆