如何在Sandy Bridge上的一系列int中快速将位计入单独的容器中? [英] How to quickly count bits into separate bins in a series of ints on Sandy Bridge?
问题描述
更新:请阅读代码,这与在一个整数中计数位数无关
是否可以使用一些巧妙的汇编程序来提高以下代码的性能?
Is it possible to improve performance of the following code with some clever assembler?
uint bit_counter[64];
void Count(uint64 bits) {
bit_counter[0] += (bits >> 0) & 1;
bit_counter[1] += (bits >> 1) & 1;
// ..
bit_counter[63] += (bits >> 63) & 1;
}
Count
在我算法的最内层循环中.
Count
is in the inner-most loop of my algorithm.
更新: 体系结构:x86-64,Sandy Bridge,因此可以使用SSE4.2,AVX1和较旧的技术,但不能使用AVX2或BMI1/2.
Update: Architecture: x86-64, Sandy Bridge, so SSE4.2, AVX1 and older tech can be used, but not AVX2 or BMI1/2.
bits
变量几乎具有随机位(接近半个零和半个半个数字)
bits
variable has almost random bits (close to half zeros and half ones)
推荐答案
也许您可以一次完成8个操作,方法是将8位以8位间隔并保持8个uint64计数.但是,每个计数器只有1个字节,因此,在必须解压缩uint64的那些文件之前,您只能累积255个count
调用.
Maybe you can do 8 at once, by taking 8 bits spaced 8 apart and keeping 8 uint64's for the counts. That's only 1 byte per single counter though, so you can only accumulate 255 invocations of count
before you'd have to unpack those uint64's.
这篇关于如何在Sandy Bridge上的一系列int中快速将位计入单独的容器中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!