如何执行_mm256_movemask_epi8(VPMOVMSKB)逆? [英] How to perform the inverse of _mm256_movemask_epi8 (VPMOVMSKB)?
问题描述
内在的:
int mask = _mm256_movemask_epi8(__m256i s1)
创建了一个面具,对应于 S1
的每个字节的最高位显著的 32
位。操作使用位操作面膜后( BMI2
)后,我想执行 _mm256_movemask_epi8
,即倒数,创建一个 __ m256i
载体包含的对应位的每个字节的最高显著位 uint32_t的面具
。
creates a mask, with its 32
bits corresponding to the most significant bit of each byte of s1
. After manipulating the mask using bit operations (BMI2
for example) I would like to perform the inverse of _mm256_movemask_epi8
, i.e., create a __m256i
vector with the most significant bit of each byte containing the corresponding bit of the uint32_t mask
.
什么是做到这一点的最好方法是什么?
What is the best way to do this?
编辑:
我需要执行相反的,因为内在的 _mm256_blendv_epi8
只接受 __ m256i
型的面膜,而不是 uint32_t的
。因此,在产生的 __ m256i
面膜,我可以忽略大于每个字节的最高位以外的位。
I need to perform the inverse because the intrinsic _mm256_blendv_epi8
accepts only __m256i
type mask instead of uint32_t
. As such, in the resulting __m256i
mask, I can ignore the bits other than the MSB of each byte.
推荐答案
下面是LUT或 PDEP
指令可能更有效的替代:
Here is an alternative to LUT or pdep
instructions that might be more efficient:
- 复制您的32位掩码双方的一些低字节
青运
注册和字节同一个寄存器的16..19。您可以使用临时数组和_mm256_load_si256
。或者,您可以32位掩码的单拷贝移动到一些低字节青运
注册,然后用播放它VPBROADCASTD(_mm_broadcastd_epi32)
或其它广播/随机指令。 - 重新排列寄存器的字节数,使低8个字节(每个)包含你的面具的低8位,接下来的8个字节 - 一个8位等,这可能与
进行VPSHUFB(_mm256_shuffle_epi8)
在接下来的8个字节包含控制寄存器低8个字节的0,1,等等。 - 选择与每个字节
VPOR(_mm256_or_si256)
或VPAND(_mm256_and_si256)
。 正确的位 - 适当的字节集MSB与
VPCMPEQB(_mm256_cmpeq_epi8)
。每个字节比较0xFF的
。如果你想切换面具的每一位,使用VPAND
上previous一步,并比较为零。
- Copy your 32-bit mask to both low bytes of some
ymm
register and bytes 16..19 of the same register. You could use temporary array and_mm256_load_si256
. Or you could move single copy of 32-bit mask to low bytes of someymm
register, then broadcast it withVPBROADCASTD (_mm_broadcastd_epi32)
or other broadcast/shuffle instructions. - Rearrange bytes of the register so that low 8 bytes (each) contain low 8 bits of your mask, next 8 bytes - next 8 bits, etc. This could be done with
VPSHUFB (_mm256_shuffle_epi8)
with control register containing '0' in low 8 bytes, '1' in next 8 bytes, etc. - Select proper bit for each byte with
VPOR (_mm256_or_si256)
orVPAND (_mm256_and_si256)
. - Set MSB of appropriate bytes with
VPCMPEQB (_mm256_cmpeq_epi8)
. Compare each byte to0xFF
. If you want each bit of the mask toggled, useVPAND
on previous step and compare to zero.
这种方法的额外的灵活性,你可以选择第2步和不同的掩码步骤#3洗牌的位掩码位(例如,你可以这个面具拷贝到青运不同的控制寄存器在相反的顺序
报名)。
Additional flexibility of this approach is that you could choose different control register for step #2 and different mask for step #3 to shuffle bits of your bit mask (for example you could copy this mask to ymm
register in reversed order).
这篇关于如何执行_mm256_movemask_epi8(VPMOVMSKB)逆?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!