如何使用手臂霓虹灯VBIT内部函数? [英] how to use arm neon vbit intrinsics?
问题描述
我不明白我VBIT,VBSL和VBIF之间如何区别与内在的霓虹灯。我需要做的VBIT操作,但如果我用vbslq指令从内在我没有得到我想要的东西。
I don't understand how I differentiate between vbit, vbsl and vbif with neon intrinsics. I need to do the vbit operation but if I use the vbslq instruction from the intrinsics I don't get what I want.
例如我有一个源向量是这样的:
For example I have a source vector like this:
uint8x16_t source = 39 62 9b 52 34 5b 47 48 47 35 0 0 0 0 0 0
目标矢量是:
uint8x16_t destination = 0 0 0 0 0 0 0 0 0 0 0 0 c3 c8 c5 d5
我想有,因为这一个输出:
I would like to have as an output this:
39 62 9b 52 34 5b 47 48 47 35 0 0 c3 c8 c5 d5
这意味着我要第一个10个字节从源复制并保留其他6不变。
我使用这款面膜:
meaning that I want to copy the first ten bytes from the source and leave the other 6 unchanged. I'm using this mask:
{0,0,0,0,0,0,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF};
什么是使用vbslq_u8正确的方法是什么?
What is the correct way to use the vbslq_u8?
推荐答案
ARM的文档不是很清楚,但它看起来像你需要使用的内在是这样的:
The ARM documentation is not very clear, but it looks like you would need to use the intrinsic like this:
uint8x16_t src = {0x39,0x62,0x9b,0x52,0x34,0x5b,0x47,0x48,
0x47,0x35,0x00,0x00,0x00,0x00,0x00,0x0};
uint8x16_t dest = {0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0xc3,0xc8,0xc5,0xd5};
uint8x16_t mask = {0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,
0xff,0xff,0x00,0x00,0x00,0x00,0x00,0x00};
dest = vbslq_u8(mask, src, dest);
请注意字节顺序在面具需要与源/目标寄存器(他们似乎在你的问题要交换?)的顺序相对应。
Note that order of bytes in the mask needs to correspond with the order in the source/dest registers (they seem to be swapped in your question ?).
还请注意,所述第一参数的固有似乎是选择掩码,其中1位选择从第二参数中的相应位和0位用于选择从所述第三参数中的相应位
Also note that the first param to the intrinsic appears to be the selection mask, where a 1 bit selects the corresponding bit from the second param and a 0 bit selects the corresponding bit from the third param.
这篇关于如何使用手臂霓虹灯VBIT内部函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!