根据掩码合并位序列a和b [英] Merge bit sequences a and b according to a mask

查看:99
本文介绍了根据掩码合并位序列a和b的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 bit twiddling hacks 网站,该操作

unsigned int a;    // value to merge in non-masked bits
unsigned int b;    // value to merge in masked bits
unsigned int mask; // 1 where bits from b should be selected; 0 where from a.
unsigned int r;    // result of (a & ~mask) | (b & mask) goes here

r = a ^ ((a ^ b) & mask); 

允许根据掩码合并两个位序列 a b .我想知道:

allows to merge two bit sequences a and b according to a mask. I was wondering:

  1. 此操作是否有特定名称/常用名称?
  2. 在某些指令集上是否存在用于此操作的特定汇编指令?

推荐答案

在SSE/AVX编程中,基于掩码从一个矢量到另一个矢量的选择性复制称为 blend .SSE4.1添加了 PBLENDVB xmm1,xmm2/m128,< XMM0> > ,其中隐式操作数XMM0控制src的哪些字节覆盖dst中的相应字节.(在没有SSE4.1的情况下,您通常会将AND并将掩码 ANDNOT 屏蔽成两个向量,或或"运算; xor技巧具有较少的指令级并行性,并且可能至少需要尽可能多的MOV指令才能复制寄存器.)

In SSE/AVX programming, selective copying from one vector to another based on a mask is called a blend. SSE4.1 added instructions like PBLENDVB xmm1, xmm2/m128, <XMM0>, where the implicit operand XMM0 controls which bytes of the src overwrite corresponding bytes in the dst. (Without SSE4.1, you'd usually AND and ANDNOT the mask onto two vectors, and OR that together; the xor trick has less instruction-level parallelism, and probably requires at least as many MOV instructions to copy registers.)

还有一个即时混合说明, pblendw ,其中的掩码是8位立即数,而不是寄存器.并且有32位和64位立即混合( blendps blendpd vpblendd )和变量混合( blendvps blendvpd ).

There's also an immediate blend instruction, pblendw, where the mask is an 8-bit immediate instead of a register. And there are 32-bit and 64-bit immediate blends (blendps, blendpd, vpblendd) and variable blends (blendvps, blendvpd).

如果其他SIMD指令集(NEON,AltiVec,任何MIPS调用其的等等)也将其称为混合",则为IDK.

IDK if other SIMD instruction sets (NEON, AltiVec, whatever MIPS calls theirs, etc.) also call them "blends" or not.

SSE/AVX(或x86整数指令)在进行按位(而不是逐元素)混合直到AVX512F之前,没有提供比普通按位XOR/AND更好的东西.

SSE/AVX (or x86 integer instructions) don't provide anything better than the usual bitwise XOR/AND for doing bitwise (instead of element-wise) blends until AVX512F.

AVX512F 可以按位执行此操作(或任何其他按位操作)三元函数)和一条 vpternlogd vpternlogq 指令.(d和q元素大小之间的唯一区别是,如果您使用掩码寄存器对目标进行合并掩码或零掩码,但这并没有阻止Intel即使在没有掩码的情况下也可以制作独立的内在函数:

AVX512F can do the bitwise version of this (or any other bitwise ternary function) with a single vpternlogd or vpternlogq instruction. (The only difference between d and q element sizes is if you use a mask register for merge-masking or zero-masking the destination, but that didn't stop Intel from making separate intrinsics even for the no-mask case:

__m512i _mm512_ternarylogic_epi32(__ m512i a,__ m512i b,__ m512i c,int imm8) 以及等效的..._ epi64版本.

__m512i _mm512_ternarylogic_epi32 (__m512i a, __m512i b, __m512i c, int imm8) and the equivalent ..._epi64 version.

imm8 立即字节是真值表.通过将a,b和c的相应位用作真值表的3位索引,可以独立确定目标的每个位.即以 imm8 [a:b:c] .

The imm8 immediate byte is a truth table. Every bit of the destination is determined independently, from the corresponding bits of a, b and c by using them as a 3-bit index into the truth table. i.e. as imm8[a:b:c].

当AVX512最终出现在主流台式机/笔记本电脑CPU中时,它会很有趣,但是距离现在还需要几年的时间.

AVX512 will be fun to play with when it eventually appears in mainstream desktop/laptop CPUs, but that's probably a couple years away still.

这篇关于根据掩码合并位序列a和b的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆