对无符号的字符图像 - ARM霓虹灯Intrinsics-的iOS开发快速高斯模糊 [英] Fast Gaussian blur on unsigned char image- ARM Neon Intrinsics- iOS Dev

查看:265
本文介绍了对无符号的字符图像 - ARM霓虹灯Intrinsics-的iOS开发快速高斯模糊的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能告诉我一个快速的函数来查找使用5x5的面具图像的高斯模糊。我需要它的iOS应用开发。我直接在定义为

Can someone tell me a fast function to find the gaussian blur of an image using a 5x5 mask. I need it for iOS app dev. I am working directly on the memory of the image defined as

unsigned char *image_sqr_Baseaaddr = (unsigned char *) malloc(noOfPixels);

for (row = 2; row < H-2; row++) 
{
    for (col = 2; col < W-2; col++) 
    {
        newPixel = 0;
        for (rowOffset=-2; rowOffset<=2; rowOffset++)
        {
            for (colOffset=-2; colOffset<=2; colOffset++) 
            {
                rowTotal = row + rowOffset;
                colTotal = col + colOffset;
                iOffset = (unsigned long)(rowTotal*W + colTotal);
                newPixel += (*(imgData + iOffset)) * gaussianMask[2 + rowOffset][2 + colOffset];
            }
        }
        i = (unsigned long)(row*W + col);
        *(imgData + i) = newPixel / 159;
    }
}

这显然是最慢的功能成为可能。我听说在iOS ARM NEON内在可用于1个周期,使多个操作。也许这就是要走的路?

This is obviously the slowest function possible. I heard that ARM Neon intrinsics on the iOS can be used to make several operations in 1 cycle. Maybe that's the way to go ?

问题是,我不是很熟悉的,没有足够的时间来学习汇编语言的时刻。因此,这将是巨大的,如果任何人都可以发布一个NEON内在code以上的C / C ++,或任何其他快速实现提到的问题。

The problem is that I am not very familiar and don't have enough time to learn assembly language at the moment. So it would be great if anyone can post a Neon intrinsics code for the problem mentioned above or any other fast implementation in C/C++.

推荐答案

在你进入SIMD优化与NEON你应该先提高你的标量的实现。用,因为它代表你的code中的最大的问题是,它已被实施为好像它是一个非可分离滤波器,而高斯内核是可分离的。通过切换到一个可分离实现可以减少操作的数量形成N- ^ 2至2N,其中在5×5内核的情况下,将减少从25乘增加至10,即2.5倍速度为很少的努力。

Before you get into SIMD optimisation with NEON you should first improve your scalar implementation. The biggest problem with your code as it stands is that it has been implemented as if it were a non-separable filter, whereas a Gaussian kernel is separable. By switching to a separable implementation you reduce the number of operations form N^2 to 2N, which in your case of a 5x5 kernel would be a reduction from 25 multiply-adds to 10, i.e. a 2.5x speed up for very little effort.

这可能是一个充分优化的标实施将满足您的需求,而无需诉诸SIMD。如果没有,那么你至少可以携带上述标量的优化过为向量化的实现。

It may be that a sufficiently optimised scalar implementation will meet your needs without the need to resort to SIMD. If not then you can at least carry these scalar optimisations over into a vectorized implementation.

http://en.wikipedia.org/wiki/Gaussian_blur

http://blogs.mathworks.com /史蒂夫/ 2006/11/28 /可分离卷积部分-2 /

这篇关于对无符号的字符图像 - ARM霓虹灯Intrinsics-的iOS开发快速高斯模糊的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆