在 uint8x8_t 霓虹寄存器中查找 min 元素的最小值和位置 [英] Find min and position of the min element in uint8x8_t neon register
问题描述
考虑这段代码:
uint8_t v[8] = { ... };
int ret = 256;
int ret_pos = -1;
for (int i=0; i<8; ++i)
{
if (v[i] < ret)
{
ret = v[i];
ret_pos = i;
}
}
它找到 min 元素的 min 和位置(ret
和 ret_pos
).在 arm 霓虹灯中,我可以使用 pairwisemin 在 v 中找到最小元素,但是如何找到最小元素的位置?
It finds min and position of the min element (ret
and ret_pos
). In arm neon I could use pairwise min to find min element in v, but how do I find position of the min element?
更新:看我自己的回答,你有什么建议来改进它?
Update: see my own answer, what would you suggest to improve it?
推荐答案
以下是我花了一些时间摆弄位和数学之后的做法:
Here's how I've done after spending some time fiddling with bits and math:
#define VMIN8(x, index, value) \
do { \
uint8x8_t m = vpmin_u8(x, x); \
m = vpmin_u8(m, m); \
m = vpmin_u8(m, m); \
uint8x8_t r = vceq_u8(x, m); \
\
uint8x8_t z = vand_u8(vmask, r); \
\
z = vpadd_u8(z, z); \
z = vpadd_u8(z, z); \
z = vpadd_u8(z, z); \
\
unsigned u32 = vget_lane_u32(vreinterpret_u32_u8(z), 0); \
index = __lzcnt(u32); \
value = vget_lane_u8(m, 0); \
} while (0)
uint8_t v[8] = { ... };
static const uint8_t mask[] = { 0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01 };
uint8x8_t vmask = vld1_u8(mask);
uint8x8_t v8 = vld1_u8(v);
int ret;
int ret_pos;
VMIN8(v8, ret_pos, ret);
__lzcnt 在哪里 clz(gcc 中的 __builtin_clz).
where __lzcnt is clz (__builtin_clz in gcc).
这是它的工作原理.首先使用pairwise min将uint8x8_t的所有u8字段设置为最小值:
Here's the how it works. At first using pairwise min set all u8 fields of uint8x8_t to the minimum value:
uint8x8_t m = vpmin_u8(x, x);
m = vpmin_u8(m, m);
m = vpmin_u8(m, m);
然后使用向量将最小元素设置为所有元素,并将所有其他元素设置为零:
then using vector compare set min element to all ones, and all others set to zeros:
uint8x8_t r = vceq_u8(x, m);
然后与包含值的掩码执行逻辑与:uint8_t mask[] {1<<7, 1<<6, 1<<5, ... 1<<<1, 1<;<0 };
:
Then perform logical AND with the mask that contains values: uint8_t mask[] {1<<7, 1<<6, 1<<5, ... 1<<1, 1<<0 };
:
uint8x8_t z = vand_u8(vmask, r);
然后使用成对添加添加
z = vpadd_u8(z, z);
z = vpadd_u8(z, z);
z = vpadd_u8(z, z);
然后使用 clz 计算第一个最小元素的位置.
and after that using clz calculate position of the first min element.
unsigned u32 = vget_lane_u32(vreinterpret_u32_u8(z), 0);
index = __lzcnt(u32);
然后,在实际代码中,我每次循环迭代多次使用 VMIN8,编译器 能够完美地交错多个 VMIN8 调用 以避免数据停滞.
Then, in real code I use VMIN8 multiple times per loop iteration and compiler is able to perfectly interleave multiple VMIN8 calls to avoid data stalls.
这篇关于在 uint8x8_t 霓虹寄存器中查找 min 元素的最小值和位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!