二元图像 - ARM NEON内在快速像素数 - 的iOS开发 [英] Fast Pixel Count on Binary Image- ARM neon intrinsics - iOS Dev
问题描述
谁能告诉我一个快速功能的算白的像素数以二进制图片。我需要它的 iOS版应用开发。我直接在定义为
Can someone tell me a fast function to count the number of white pixels in a binary image. I need it for iOS app dev. I am working directly on the memory of the image defined as
bool *imageData = (bool *) malloc(noOfPixels * sizeof(bool));
我实现功能
int whiteCount = 0;
for (int q=i; q<i+windowHeight; q++)
{
for (int w=j; w<j+windowWidth; w++)
{
if (imageData[q*W + w] == 1)
whiteCount++;
}
}
这显然是最慢的功能成为可能。我听说的 ARM NEON内在在iOS
可以使用在1个周期,使多个操作。也许这就是去??的路上
This is obviously the slowest function possible. I heard that ARM Neon intrinsics on the iOS can be used to make several operations in 1 cycle. Maybe thats the way to go ??
问题是,我不是很熟悉的,没有足够的时间来学习汇编语言的时刻。因此,这将是巨大的,如果任何人都可以发布一个NEON内在code以上的C / C ++,或任何其他快速实现提到的问题。
The problem is that I am not very familiar and don't have enough time to learn assembly language at the moment. So it would be great if anyone can post a Neon intrinsics code for the problem mentioned above or any other fast implementation in C/C++.
在霓虹灯的内在,我能在网上找到的唯一code是code为RGB为灰色
的http://computer-vision-talks.com/2011/02/a-very-fast-bgra-to-grayscale-conversion-on-iphone/
The only code in neon intrinsics that I am able to find online is the code for rgb to gray http://computer-vision-talks.com/2011/02/a-very-fast-bgra-to-grayscale-conversion-on-iphone/
推荐答案
首先你可以通过分解出乘法和摆脱分公司加快原code一点:
Firstly you can speed up the original code a little by factoring out the multiply and getting rid of the branch:
int whiteCount = 0;
for (int q = i; q < i + windowHeight; q++)
{
const bool * const row = &imageData[q * W];
for (int w = j; w < j + windowWidth; w++)
{
whiteCount += row[w];
}
}
(这里假定为imageData []
是真正的二进制,即每个元素永远只能是0或1)。
(This assumes that imageData[]
is truly binary, i.e. each element can only ever be 0 or 1.)
下面是一个简单的NEON实现:
Here is a simple NEON implementation:
#include <arm_neon.h>
// ...
int i, w;
int whiteCount = 0;
uint32x4_t v_count = { 0 };
for (q = i; q < i + windowHeight; q++)
{
const bool * const row = &imageData[q * W];
uint16x8_t vrow_count = { 0 };
for (w = j; w <= j + windowWidth - 16; w += 16) // SIMD loop
{
uint8x16_t v = vld1q_u8(&row[j]); // load 16 x 8 bit pixels
vrow_count = vpadalq_u8(vrow_count, v); // accumulate 16 bit row counts
}
for ( ; w < j + windowWidth; ++w) // scalar clean up loop
{
whiteCount += row[j];
}
v_count = vpadalq_u16(v_count, vrow_count); // update 32 bit image counts
} // from 16 bit row counts
// add 4 x 32 bit partial counts from SIMD loop to scalar total
whiteCount += vgetq_lane_s32(v_count, 0);
whiteCount += vgetq_lane_s32(v_count, 1);
whiteCount += vgetq_lane_s32(v_count, 2);
whiteCount += vgetq_lane_s32(v_count, 3);
// total is now in whiteCount
(这里假定为imageData []
是真正的二进制, imageWidth&LT; = 2 ^ 19
和的sizeof(布尔)== 1
)
更新版本 unsigned char型
和值为255的白色,0为黑色:
Updated version for unsigned char
and values of 255 for white, 0 for black:
#include <arm_neon.h>
// ...
int i, w;
int whiteCount = 0;
const uint8x16_t v_mask = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };
uint32x4_t v_count = { 0 };
for (q = i; q < i + windowHeight; q++)
{
const uint8_t * const row = &imageData[q * W];
uint16x8_t vrow_count = { 0 };
for (w = j; w <= j + windowWidth - 16; w += 16) // SIMD loop
{
uint8x16_t v = vld1q_u8(&row[j]); // load 16 x 8 bit pixels
v = vandq_u8(v, v_mask); // mask out all but LS bit
vrow_count = vpadalq_u8(vrow_count, v); // accumulate 16 bit row counts
}
for ( ; w < j + windowWidth; ++w) // scalar clean up loop
{
whiteCount += (row[j] == 255);
}
v_count = vpadalq_u16(v_count, vrow_count); // update 32 bit image counts
} // from 16 bit row counts
// add 4 x 32 bit partial counts from SIMD loop to scalar total
whiteCount += vgetq_lane_s32(v_count, 0);
whiteCount += vgetq_lane_s32(v_count, 1);
whiteCount += vgetq_lane_s32(v_count, 2);
whiteCount += vgetq_lane_s32(v_count, 3);
// total is now in whiteCount
(这里假定为imageData []
是拥有255白色0黑色和 imageWidth&LT值= 2 ^ 19
)
(This assumes that imageData[]
is has values of 255 for white and 0 for black, and imageWidth <= 2^19
.)
的请注意,上述所有code是未经测试,可能需要一些进一步的工作。的
这篇关于二元图像 - ARM NEON内在快速像素数 - 的iOS开发的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!