在 uint8x8_t 霓虹寄存器中查找 min 元素的最小值和位置 [英] Find min and position of the min element in uint8x8_t neon register

查看:17
本文介绍了在 uint8x8_t 霓虹寄存器中查找 min 元素的最小值和位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑这段代码:

uint8_t v[8] = { ... };
int ret = 256;
int ret_pos = -1;
for (int i=0; i<8; ++i)
{
    if (v[i] < ret)
    {
        ret = v[i];
        ret_pos = i;
    }
}

它找到 min 元素的 min 和位置(retret_pos).在 arm 霓虹灯中,我可以使用 pairwisemin 在 v 中找到最小元素,但是如何找到最小元素的位置?

It finds min and position of the min element (ret and ret_pos). In arm neon I could use pairwise min to find min element in v, but how do I find position of the min element?

更新:看我自己的回答,你有什么建议来改进它?

Update: see my own answer, what would you suggest to improve it?

推荐答案

以下是我花了一些时间摆弄位和数学之后的做法:

Here's how I've done after spending some time fiddling with bits and math:

#define VMIN8(x, index, value)                               \
do {                                                         \
    uint8x8_t m = vpmin_u8(x, x);                            \
    m = vpmin_u8(m, m);                                      \
    m = vpmin_u8(m, m);                                      \
    uint8x8_t r = vceq_u8(x, m);                             \
                                                             \
    uint8x8_t z = vand_u8(vmask, r);                         \
                                                             \
    z = vpadd_u8(z, z);                                      \
    z = vpadd_u8(z, z);                                      \
    z = vpadd_u8(z, z);                                      \
                                                             \
    unsigned u32 = vget_lane_u32(vreinterpret_u32_u8(z), 0); \
    index = __lzcnt(u32);                                    \
    value = vget_lane_u8(m, 0);                              \
} while (0)


uint8_t v[8] = { ... };

static const uint8_t mask[] = { 0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01 };
uint8x8_t vmask = vld1_u8(mask);

uint8x8_t v8 = vld1_u8(v);
int ret;
int ret_pos;
VMIN8(v8, ret_pos, ret);

__lzcnt 在哪里 clz(gcc 中的 __builtin_clz).

where __lzcnt is clz (__builtin_clz in gcc).

这是它的工作原理.首先使用pairwise min将uint8x8_t的所有u8字段设置为最小值:

Here's the how it works. At first using pairwise min set all u8 fields of uint8x8_t to the minimum value:

    uint8x8_t m = vpmin_u8(x, x);
    m = vpmin_u8(m, m);
    m = vpmin_u8(m, m);

然后使用向量将最小元素设置为所有元素,并将所有其他元素设置为零:

then using vector compare set min element to all ones, and all others set to zeros:

    uint8x8_t r = vceq_u8(x, m);

然后与包含值的掩码执行逻辑与:uint8_t mask[] {1<<7, 1<<6, 1<<5, ... 1<<<1, 1<;<0 };:

Then perform logical AND with the mask that contains values: uint8_t mask[] {1<<7, 1<<6, 1<<5, ... 1<<1, 1<<0 };:

uint8x8_t z = vand_u8(vmask, r);

然后使用成对添加添加

z = vpadd_u8(z, z);
z = vpadd_u8(z, z);
z = vpadd_u8(z, z);

然后使用 clz 计算第一个最小元素的位置.

and after that using clz calculate position of the first min element.

unsigned u32 = vget_lane_u32(vreinterpret_u32_u8(z), 0);
index = __lzcnt(u32);

然后,在实际代码中,我每次循环迭代多次使用 VMIN8,编译器 能够完美地交错多个 VMIN8 调用 以避免数据停滞.

Then, in real code I use VMIN8 multiple times per loop iteration and compiler is able to perfectly interleave multiple VMIN8 calls to avoid data stalls.

这篇关于在 uint8x8_t 霓虹寄存器中查找 min 元素的最小值和位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆