从位置i开始以n个遮罩制作遮罩的最快方法 [英] Fastest way to produce a mask with n ones starting at position i

查看:68
本文介绍了从位置i开始以n个遮罩制作遮罩的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从位置pos开始将len位设置为1的掩码的最快方法(就通用现代体系结构的cpu周期而言)是什么:

What is the fastest way (in terms of cpu cycles on common modern architecture), to produce a mask with len bits set to 1 starting at position pos:

template <class UIntType>
constexpr T make_mask(std::size_t pos, std::size_t len)
{
    // Body of the function
}

// Call of the function
auto mask = make_mask<uint32_t>(4, 10);
// mask = 00000000 00000000 00111111 11110000 
// (in binary with MSB on the left and LSB on the right)

此外,是否有任何编译器内部函数或 BMI 函数可以提供帮助?

Plus, is there any compiler intrinsics or BMI function that can help?

推荐答案

如果通过从pos开始",则表示掩码的最低位在与2 pos (例如您的示例):

If by "starting at pos", you mean that the lowest-order bit of the mask is at the position corresponding with 2pos (as in your example):

((UIntType(1) << len) - UIntType(1)) << pos

如果len可能是≥ UIntType中的位数,请通过测试避免未​​定义行为:

If it is possible that len is ≥ the number of bits in UIntType, avoid Undefined Behaviour with a test:

(((len < std::numeric_limits<UIntType>::digits)
     ? UIntType(1)<<len
     : 0) - UIntType(1)) << pos

(如果pos也有可能是≥ std::numeric_limits<UIntType>::digits,则需要另一个三元运算测试.)

(If it is also possible that pos is ≥ std::numeric_limits<UIntType>::digits, you'll need another ternary op test.)

您还可以使用:

(UIntType(1)<<(len>>1)<<((len+1)>>1) - UIntType(1)) << pos

避免了三元运算,但要额外增加三个移位运算符;我怀疑这样做是否会更快,但必须确定要进行仔细的基准测试.

which avoids the ternary op at the cost of three extra shift operators; I doubt whether it would be faster but careful benchmarking would be necessary to know for sure.

这篇关于从位置i开始以n个遮罩制作遮罩的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆