请问这个算法来计算一个32位整数工作组的位数? [英] How does this algorithm to count the number of set bits in a 32-bit integer work?

查看:179
本文介绍了请问这个算法来计算一个32位整数工作组的位数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

int SWAR(unsigned int i)
{
    i = i - ((i >> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}

我已经看到了这code计数的位数等于 1 32位整数,我注意到,它的性能比<$更好C $ C> __ builtin_popcount ,但我不明白它的工作方式。

I have seen this code that counts the number of bits equals to 1 in 32-bit integer, and I noticed that its performance is better than __builtin_popcount but I can't understand the way it works.

能有人给这个code如何工作的详细解释?

Can someone give a detailed explanation of how this code works?

推荐答案

OK,让我们通过线code线:

OK, let's go through the code line by line:

i = i - ((i >> 1) & 0x55555555);

首先,的意义恒 0x55555555 是,使用Java / GCC的风格写的binary文字符号

First of all, the significance of the constant 0x55555555 is that, written using the Java / GCC style binary literal notation),

0x55555555 = 0b01010101010101010101010101010101

也就是说,所有的奇数位(计数的最低位为位1 =奇)为 1 ,和所有的偶数位是 0

这位前pression ((I&GT;&GT; 1)及0x55555555)从而转移的位I 权被一个,然后将所有偶数位设置为零。 (等价的,我们可能已经先设定所有的奇数位I 到零&放大器; 0xAAAAAAAA 和的然后的向右移动一位的结果。)为方便起见,我们称之为中间值Ĵ

The expression ((i >> 1) & 0x55555555) thus shifts the bits of i right by one, and then sets all the even-numbered bits to zero. (Equivalently, we could've first set all the odd-numbered bits of i to zero with & 0xAAAAAAAA and then shifted the result right by one bit.) For convenience, let's call this intermediate value j.

当我们减去这个Ĵ从原来的会发生什么我?那么,让我们看看会发生什么,如果 I 只有的两个的位:

What happens when we subtract this j from the original i? Well, let's see what would happen if i had only two bits:

    i           j         i - j
----------------------------------
0 = 0b00    0 = 0b00    0 = 0b00
1 = 0b01    0 = 0b00    1 = 0b01
2 = 0b10    1 = 0b01    1 = 0b01
3 = 0b11    1 = 0b01    2 = 0b10

嘿!我们已经成功地指望我们的两位数字的位数!

Hey! We've managed to count the bits of our two-bit number!

OK,但如果 I 有两个以上的位设置?事实上,这是pretty容易检查的最低两位我 - J 仍然会通过上表给出的,因此将第三和第四位的,第五和第六位,所以和。特别是:

OK, but what if i has more than two bits set? In fact, it's pretty easy to check that the lowest two bits of i - j will still be given by the table above, and so will the third and fourth bits, and the fifth and sixth bits, and so and. In particular:


  • 尽管&GT;&GT; 1 我最低的两位 - J 不受 I ,因为它们会被屏蔽掉Ĵ&安培; 0x55555555 ;以及

  • despite the >> 1, the lowest two bits of i - j are not affected by the third or higher bits of i, since they'll be masked out of j by the & 0x55555555; and

因为 j的最低两位绝不能有更大的数值高于 ,减法永远不会从的第三位借我:因此,的最低两位我也不能影响第三或更高位的我 - Ĵ

since the lowest two bits of j can never have a greater numerical value than those of i, the subtraction will never borrow from the third bit of i: thus, the lowest two bits of i also cannot affect the third or higher bits of i - j.

事实上,通过重复同样的论点,我们可以看到,在此行的计算,实际上,申请表格上方的每个的16二位块的 I 并行的。也就是说,在执行这条线后,的新值i 现在将包含的最低两比特的数量的相应位中设置位在原始值 I ,等会在接下来的两个位,依此类推。

In fact, by repeating the same argument, we can see that the calculation on this line, in effect, applies the table above to each of the 16 two-bit blocks in i in parallel. That is, after executing this line, the lowest two bits of the new value of i will now contain the number of bits set among the corresponding bits in the original value of i, and so will the next two bits, and so on.

i = (i & 0x33333333) + ((i >> 2) & 0x33333333);

相比于第一线,这一个相当简单。首先,注意

Compared to the first line, this one's quite simple. First, note that

0x33333333 = 0b00110011001100110011001100110011

因此​​, I和0x33333333 取上面计算的双位计数和扔掉他们每个第二个,而(I&GT;&GT; 2)及0x33333333 做同样的之后的移 I 右移两位。然后,我们添加在一起的结果。

Thus, i & 0x33333333 takes the two-bit counts calculated above and throws away every second one of them, while (i >> 2) & 0x33333333 does the same after shifting i right by two bits. Then we add the results together.

因此​​,实际上,这行的作用是采取最低两的bitcounts和原始输入的第二最低两比特,计算关于previous行,然后添加在一起,得到的比特计数最低的的输入位。并再次,它这样做并行的的所有的输入的8四位块(=十六进制数)。

Thus, in effect, what this line does is take the bitcounts of the lowest two and the second-lowest two bits of the original input, computed on the previous line, and add them together to give the bitcount of the lowest four bits of the input. And, again, it does this in parallel for all the 8 four-bit blocks (= hex digits) of the input.

return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;

OK,这是怎么回事呢?

OK, what's going on here?

好吧,首先,(1 +(I&GT;&GT; 4))及0x0F0F0F0F 不完全一样的previous线,但它增加了相邻的四位的bitcounts在一起,给每个bitcounts 八位的块的输入(即字节)。 (在这里,不像在previous线,我们可以移动跑不掉&安培; 增加外,因为我们知道,八位位计数不能超过8,所以会装进四位不会溢出。)

Well, first of all, (i + (i >> 4)) & 0x0F0F0F0F does exactly the same as the previous line, except it adds the adjacent four-bit bitcounts together to give the bitcounts of each eight-bit block (i.e. byte) of the input. (Here, unlike on the previous line, we can get away with moving the & outside the addition, since we know that the eight-bit bitcount can never exceed 8, and therefore will fit inside four bits without overflowing.)

现在我们有由4个8位字节的32位数字,每字节保持在原来的输入的该字节1位的数量。 (让我们把这些字节 A B C D )。所以会发生什么,当我们乘该值(让我们通过称之为 K ) 0x01010101

Now we have a 32-bit number consisting of four 8-bit bytes, each byte holding the number of 1-bit in that byte of the original input. (Let's call these bytes A, B, C and D.) So what happens when we multiply this value (let's call it k) by 0x01010101?

好吧,既然 0x01010101 =(1 <<;&LT; 24)+(1&LT;&LT; 16)+(1&LT;&LT; 8)+ 1 ,我们有:

k * 0x01010101 = (k << 24) + (k << 16) + (k << 8) + k

因此​​,在最高的结果的字节最终被的总和:

Thus, the highest byte of the result ends up being the sum of:


  • 其原始值,由于 K 项,加上

  • 下一较低字节的值,由于在 K&所述;&下; 8 项,加上

  • 第二下部字节的值,由于在 K&所述;&下; 16 项,加上

  • 第四和最低字节的值,由于 K&LT;&LT; 24 项。

  • its original value, due to the k term, plus
  • the value of the next lower byte, due to the k << 8 term, plus
  • the value of the second lower byte, due to the k << 16 term, plus
  • the value of the fourth and lowest byte, due to the k << 24 term.

(在一般情况下,有可能也有从低字节进行,但由于我们知道每个字节的值在8以下,我们知道除了将永远不会溢出,并创建一个进位)。

(In general, there could also be carries from lower bytes, but since we know the value of each byte is at most 8, we know the addition will never overflow and create a carry.)

也就是说,最高字节 K * 0x01010101 最终被输入的所有字节的bitcounts的总和,即32位的总位计数位输入数字。最后的&GT;&GT; 24 然后简单地低于最低最高字节偏移此值。

That is, the highest byte of k * 0x01010101 ends up being the sum of the bitcounts of all the bytes of the input, i.e. the total bitcount of the 32-bit input number. The final >> 24 then simply shifts this value down from the highest byte to the lowest.

诗。:此code可以很容易地为64位整数,只需改变 0x01010101 扩展到 0x0101010101010101 &GT;&GT; 24 &GT;&GT; 56 。事实上,同样的方法将甚至128位的整数工作; 256比特就需要增加一个额外的移位/添加/掩模步骤,然而,由于数256不再相当适合的8位字节。

Ps. This code could easily be extended to 64-bit integers, simply by changing the 0x01010101 to 0x0101010101010101 and the >> 24 to >> 56. Indeed, the same method would even work for 128-bit integers; 256 bits would require adding one extra shift / add / mask step, however, since the number 256 no longer quite fits into an 8-bit byte.

这篇关于请问这个算法来计算一个32位整数工作组的位数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆