无符号字符图像上的快速高斯模糊 - ARM Neon Intrinsics - iOS Dev [英] Fast Gaussian blur on unsigned char image- ARM Neon Intrinsics- iOS Dev

查看:23
本文介绍了无符号字符图像上的快速高斯模糊 - ARM Neon Intrinsics - iOS Dev的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能告诉我一个使用 5x5 蒙版查找图像高斯模糊的快速函数.我需要它用于 iOS 应用程序开发.我正在直接处理定义为

Can someone tell me a fast function to find the gaussian blur of an image using a 5x5 mask. I need it for iOS app dev. I am working directly on the memory of the image defined as

unsigned char *image_sqr_Baseaaddr = (unsigned char *) malloc(noOfPixels);

for (row = 2; row < H-2; row++) 
{
    for (col = 2; col < W-2; col++) 
    {
        newPixel = 0;
        for (rowOffset=-2; rowOffset<=2; rowOffset++)
        {
            for (colOffset=-2; colOffset<=2; colOffset++) 
            {
                rowTotal = row + rowOffset;
                colTotal = col + colOffset;
                iOffset = (unsigned long)(rowTotal*W + colTotal);
                newPixel += (*(imgData + iOffset)) * gaussianMask[2 + rowOffset][2 + colOffset];
            }
        }
        i = (unsigned long)(row*W + col);
        *(imgData + i) = newPixel / 159;
    }
}

这显然是最慢的函数.我听说 iOS 上的 ARM Neon 内在函数可用于在 1 个周期内进行多次操作.也许这就是要走的路?

This is obviously the slowest function possible. I heard that ARM Neon intrinsics on the iOS can be used to make several operations in 1 cycle. Maybe that's the way to go ?

问题是我不是很熟悉,暂时没有足够的时间学习汇编语言.因此,如果有人可以针对上述问题或任何其他 C/C++ 中的快速实现发布 Neon 内在函数代码,那就太好了.

The problem is that I am not very familiar and don't have enough time to learn assembly language at the moment. So it would be great if anyone can post a Neon intrinsics code for the problem mentioned above or any other fast implementation in C/C++.

推荐答案

在使用 NEON 进行 SIMD 优化之前,您应该首先改进您的标量实现.您的代码目前最大的问题是它已经被实现为一个不可分离的滤波器,而高斯核是可分离的.通过切换到可分离的实现,您可以将 N^2 的操作数量减少到 2N,在您的 5x5 内核的情况下,这将从 25 个乘加减少到 10 个,即只需很少的工作即可将速度提高 2.5 倍.

Before you get into SIMD optimisation with NEON you should first improve your scalar implementation. The biggest problem with your code as it stands is that it has been implemented as if it were a non-separable filter, whereas a Gaussian kernel is separable. By switching to a separable implementation you reduce the number of operations form N^2 to 2N, which in your case of a 5x5 kernel would be a reduction from 25 multiply-adds to 10, i.e. a 2.5x speed up for very little effort.

充分优化的标量实现可能会满足您的需求,而无需求助于 SIMD.如果没有,那么您至少可以将这些标量优化带入向量化实现中.

It may be that a sufficiently optimised scalar implementation will meet your needs without the need to resort to SIMD. If not then you can at least carry these scalar optimisations over into a vectorized implementation.

http://en.wikipedia.org/wiki/Gaussian_blur

http://blogs.mathworks.com/steve/2006/11/28/separable-convolution-part-2/

这篇关于无符号字符图像上的快速高斯模糊 - ARM Neon Intrinsics - iOS Dev的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆