从位置i开始以n个遮罩制作遮罩的最快方法 [英] Fastest way to produce a mask with n ones starting at position i
问题描述
从位置pos
开始将len
位设置为1的掩码的最快方法(就通用现代体系结构的cpu周期而言)是什么:
What is the fastest way (in terms of cpu cycles on common modern architecture), to produce a mask with len
bits set to 1 starting at position pos
:
template <class UIntType>
constexpr T make_mask(std::size_t pos, std::size_t len)
{
// Body of the function
}
// Call of the function
auto mask = make_mask<uint32_t>(4, 10);
// mask = 00000000 00000000 00111111 11110000
// (in binary with MSB on the left and LSB on the right)
此外,是否有任何编译器内部函数或 BMI 函数可以提供帮助?
Plus, is there any compiler intrinsics or BMI function that can help?
推荐答案
如果通过从pos
开始",则表示掩码的最低位在与2 pos (例如您的示例):
If by "starting at pos
", you mean that the lowest-order bit of the mask is at the position corresponding with 2pos (as in your example):
((UIntType(1) << len) - UIntType(1)) << pos
如果len
可能是≥ UIntType
中的位数,请通过测试避免未定义行为:
If it is possible that len
is ≥ the number of bits in UIntType
, avoid Undefined Behaviour with a test:
(((len < std::numeric_limits<UIntType>::digits)
? UIntType(1)<<len
: 0) - UIntType(1)) << pos
(如果pos
也有可能是≥ std::numeric_limits<UIntType>::digits
,则需要另一个三元运算测试.)
(If it is also possible that pos
is ≥ std::numeric_limits<UIntType>::digits
, you'll need another ternary op test.)
您还可以使用:
(UIntType(1)<<(len>>1)<<((len+1)>>1) - UIntType(1)) << pos
避免了三元运算,但要额外增加三个移位运算符;我怀疑这样做是否会更快,但必须确定要进行仔细的基准测试.
which avoids the ternary op at the cost of three extra shift operators; I doubt whether it would be faster but careful benchmarking would be necessary to know for sure.
这篇关于从位置i开始以n个遮罩制作遮罩的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!