二元图像 - ARM NEON内在快速像素数 - 的iOS开发 [英] Fast Pixel Count on Binary Image- ARM neon intrinsics - iOS Dev

查看:625
本文介绍了二元图像 - ARM NEON内在快速像素数 - 的iOS开发的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能告诉我一个快速功能的算白的像素数以二进制图片。我需要它的 iOS版应用开发。我直接在定义为

Can someone tell me a fast function to count the number of white pixels in a binary image. I need it for iOS app dev. I am working directly on the memory of the image defined as

  bool *imageData = (bool *) malloc(noOfPixels * sizeof(bool));

我实现功能

             int whiteCount = 0;
             for (int q=i; q<i+windowHeight; q++)
             {
                 for (int w=j; w<j+windowWidth; w++)
                 { 
                     if (imageData[q*W + w] == 1)
                         whiteCount++;
                 }
             }

这显然是最慢的功能成为可能。我听说的 ARM NEON内在在iOS
可以使用在1个周期,使多个操作。也许这就是去??的路上

This is obviously the slowest function possible. I heard that ARM Neon intrinsics on the iOS can be used to make several operations in 1 cycle. Maybe thats the way to go ??

问题是,我不是很熟悉的,没有足够的时间来学习汇编语言的时刻。因此,这将是巨大的,如果任何人都可以发布一个NEON内在code以上的C / C ++,或任何其他快速实现提到的问题。

The problem is that I am not very familiar and don't have enough time to learn assembly language at the moment. So it would be great if anyone can post a Neon intrinsics code for the problem mentioned above or any other fast implementation in C/C++.

在霓虹灯的内在,我能在网上找到的唯一code是code为RGB为灰色
http://computer-vision-talks.com/2011/02/a-very-fast-bgra-to-grayscale-conversion-on-iphone/

The only code in neon intrinsics that I am able to find online is the code for rgb to gray http://computer-vision-talks.com/2011/02/a-very-fast-bgra-to-grayscale-conversion-on-iphone/

推荐答案

首先你可以通过分解出乘法和摆脱分公司加快原code一点:

Firstly you can speed up the original code a little by factoring out the multiply and getting rid of the branch:

 int whiteCount = 0;
 for (int q = i; q < i + windowHeight; q++)
 {
     const bool * const row = &imageData[q * W];

     for (int w = j; w < j + windowWidth; w++)
     { 
         whiteCount += row[w];
     }
 }

(这里假定为imageData [] 是真正的二进制,即每个元素永远只能是0或1)。

(This assumes that imageData[] is truly binary, i.e. each element can only ever be 0 or 1.)

下面是一个简单的NEON实现:

Here is a simple NEON implementation:

#include <arm_neon.h>

// ...

int i, w;
int whiteCount = 0;
uint32x4_t v_count = { 0 };

for (q = i; q < i + windowHeight; q++)
{
    const bool * const row = &imageData[q * W];

    uint16x8_t vrow_count = { 0 };

    for (w = j; w <= j + windowWidth - 16; w += 16) // SIMD loop
    {
        uint8x16_t v = vld1q_u8(&row[j]);           // load 16 x 8 bit pixels
        vrow_count = vpadalq_u8(vrow_count, v);     // accumulate 16 bit row counts
    }
    for ( ; w < j + windowWidth; ++w)               // scalar clean up loop
    {
        whiteCount += row[j];
    }
    v_count = vpadalq_u16(v_count, vrow_count);     // update 32 bit image counts
}                                                   // from 16 bit row counts
// add 4 x 32 bit partial counts from SIMD loop to scalar total
whiteCount += vgetq_lane_s32(v_count, 0);
whiteCount += vgetq_lane_s32(v_count, 1);
whiteCount += vgetq_lane_s32(v_count, 2);
whiteCount += vgetq_lane_s32(v_count, 3);
// total is now in whiteCount

(这里假定为imageData [] 是真正的二进制, imageWidth&LT; = 2 ^ 19 的sizeof(布尔)== 1

更新版本 unsigned char型和值为255的白色,0为黑色:

Updated version for unsigned char and values of 255 for white, 0 for black:

#include <arm_neon.h>

// ...

int i, w;
int whiteCount = 0;
const uint8x16_t v_mask = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };
uint32x4_t v_count = { 0 };

for (q = i; q < i + windowHeight; q++)
{
    const uint8_t * const row = &imageData[q * W];

    uint16x8_t vrow_count = { 0 };

    for (w = j; w <= j + windowWidth - 16; w += 16) // SIMD loop
    {
        uint8x16_t v = vld1q_u8(&row[j]);           // load 16 x 8 bit pixels
        v = vandq_u8(v, v_mask);                    // mask out all but LS bit
        vrow_count = vpadalq_u8(vrow_count, v);     // accumulate 16 bit row counts
    }
    for ( ; w < j + windowWidth; ++w)               // scalar clean up loop
    {
        whiteCount += (row[j] == 255);
    }
    v_count = vpadalq_u16(v_count, vrow_count);     // update 32 bit image counts
}                                                   // from 16 bit row counts
// add 4 x 32 bit partial counts from SIMD loop to scalar total
whiteCount += vgetq_lane_s32(v_count, 0);
whiteCount += vgetq_lane_s32(v_count, 1);
whiteCount += vgetq_lane_s32(v_count, 2);
whiteCount += vgetq_lane_s32(v_count, 3);
// total is now in whiteCount

(这里假定为imageData [] 是拥有255白色0黑色和 imageWidth&LT值= 2 ^ 19

(This assumes that imageData[] is has values of 255 for white and 0 for black, and imageWidth <= 2^19.)

请注意,上述所有code是未经测试,可能需要一些进一步的工作。

这篇关于二元图像 - ARM NEON内在快速像素数 - 的iOS开发的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆