如何执行_mm256_movemask_epi8 (VPMOVMSKB) 的逆运算? [英] How to perform the inverse of _mm256_movemask_epi8 (VPMOVMSKB)?
问题描述
内在:
int mask = _mm256_movemask_epi8(__m256i s1)
创建一个掩码,其32
位对应于s1
的每个字节的最高有效位.使用位操作(例如BMI2
)操作掩码后,我想执行_mm256_movemask_epi8
的逆操作,即创建一个__m256i
向量每个字节的最高有效位包含 uint32_t 掩码
的相应位.
creates a mask, with its 32
bits corresponding to the most significant bit of each byte of s1
. After manipulating the mask using bit operations (BMI2
for example) I would like to perform the inverse of _mm256_movemask_epi8
, i.e., create a __m256i
vector with the most significant bit of each byte containing the corresponding bit of the uint32_t mask
.
最好的方法是什么?
我需要执行相反的操作,因为内在的 _mm256_blendv_epi8
只接受 __m256i
类型掩码而不是 uint32_t
.因此,在生成的 __m256i
掩码中,我可以忽略每个字节的 MSB 以外的位.
I need to perform the inverse because the intrinsic _mm256_blendv_epi8
accepts only __m256i
type mask instead of uint32_t
. As such, in the resulting __m256i
mask, I can ignore the bits other than the MSB of each byte.
推荐答案
这里是 LUT 或 pdep
指令的替代方案,可能更有效:
Here is an alternative to LUT or pdep
instructions that might be more efficient:
- 将您的 32 位掩码复制到某些
ymm
寄存器的低字节和同一寄存器的 16..19 字节.您可以使用临时数组和_mm256_load_si256
.或者,您可以将 32 位掩码的单个副本移动到某些ymm
寄存器的低字节,然后使用VPBROADCASTD (_mm_broadcastd_epi32)
或其他广播/随机播放指令进行广播.莉> - 重新排列寄存器的字节,使低 8 字节(每个)包含掩码的低 8 位、接下来的 8 字节 - 下 8 位等.这可以使用
VPSHUFB (_mm256_shuffle_epi8)
控制寄存器在低 8 个字节中包含0",在接下来的 8 个字节中包含1"等. - 使用
VPOR (_mm256_or_si256)
或VPAND (_mm256_and_si256)
为每个字节选择合适的位. - 使用
VPCMPEQB (_mm256_cmpeq_epi8)
设置适当字节的 MSB.将每个字节与0xFF
进行比较.如果您想要切换掩码的每一位,请在上一步中使用VPAND
并与零进行比较.
- Copy your 32-bit mask to both low bytes of some
ymm
register and bytes 16..19 of the same register. You could use temporary array and_mm256_load_si256
. Or you could move single copy of 32-bit mask to low bytes of someymm
register, then broadcast it withVPBROADCASTD (_mm_broadcastd_epi32)
or other broadcast/shuffle instructions. - Rearrange bytes of the register so that low 8 bytes (each) contain low 8 bits of your mask, next 8 bytes - next 8 bits, etc. This could be done with
VPSHUFB (_mm256_shuffle_epi8)
with control register containing '0' in low 8 bytes, '1' in next 8 bytes, etc. - Select proper bit for each byte with
VPOR (_mm256_or_si256)
orVPAND (_mm256_and_si256)
. - Set MSB of appropriate bytes with
VPCMPEQB (_mm256_cmpeq_epi8)
. Compare each byte to0xFF
. If you want each bit of the mask toggled, useVPAND
on previous step and compare to zero.
这种方法的额外灵活性是,您可以为步骤 #2 选择不同的控制寄存器,为步骤 #3 选择不同的掩码,以对位掩码的位进行混洗(例如,您可以将此掩码复制到 ymm
以相反的顺序注册).
Additional flexibility of this approach is that you could choose different control register for step #2 and different mask for step #3 to shuffle bits of your bit mask (for example you could copy this mask to ymm
register in reversed order).
这篇关于如何执行_mm256_movemask_epi8 (VPMOVMSKB) 的逆运算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!