如何计算 32 位无符号整数中的前导零 [英] How to count leading zeros in a 32 bit unsigned integer

查看:31
本文介绍了如何计算 32 位无符号整数中的前导零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能告诉我在 C 编程中计算 32 位无符号整数中前导零的数量的有效算法是什么?

Could anyone tell me please what is an efficient algorithm to count the number of leading zeroes in a 32-bit unsigned integer in C programming?

推荐答案

此讨论假定您的编译器要么不支持该操作,要么无法生成足够好的程序集.请注意,现在这两种情况都不太可能出现,所以我建议只使用 __builtin_clz 用于 gcc 或编译器上的等效项.

This discussion assumes that your compiler either doesn't support the operation or that it doesn't produce good enough assembly. Note that both of these are unlikely nowadays so I'd recommend just using __builtin_clz for gcc or equivalent on your compiler.

请注意,确定哪个是最佳"clz 算法只能由您来完成.现代处理器是复杂的野兽,这些算法的性能在很大程度上取决于您运行它的平台、您扔给它的数据以及使用它的代码.唯一确定的方法是测量,测量和测量更多.如果您无法分辨其中的区别,那么您可能没有看到自己的瓶颈,而将时间花在其他地方会更好.

Note that determining which is the "best" clz algo can only be done by you. Modern processors are complicated beasts and the performance of these algorithm will heavily depend on the platform you run it on, the data you throw at it and the code that uses it. The only way to be sure is to measure, measure and measure some more. If you can't tell the difference then you're probably not looking at your bottleneck and your time will be better spent elsewhere.

既然无聊的免责声明已经不存在了,让我们来看看Hacker's Delight 对这个问题是怎么说的.一项快速调查表明,所有算法都依赖于某种描述的二分搜索.这是一个简单的例子:

Now that the boring disclaimers are out of the way, let's take a look at what Hacker's Delight has to say about the problem. A quick survey shows that all algorithms rely on a binary search of some description. Here's a straightforward example:

int n = 32;
unsigned y;

y = x >>16; if (y != 0) { n = n -16; x = y; }
y = x >> 8; if (y != 0) { n = n - 8; x = y; }
y = x >> 4; if (y != 0) { n = n - 4; x = y; }
y = x >> 2; if (y != 0) { n = n - 2; x = y; }
y = x >> 1; if (y != 0) return n - 2;
return n - x;

请注意,这适用于 32 个整数,如果需要,它也可以转换为迭代版本.不幸的是,该解决方案没有大量的指令级并行性,并且有相当多的分支,这些分支不能很好地处理算法.请注意,存在上述代码的无分支版本,但它更加冗长,因此我不会在此处重现.

Note that this works on 32 ints and that it can also be transformed into an iterative version if needed. Unfortunately that solution doesn't have a whole lot of instruction-level parallelism and has quite a few branches which doesn't make for a very good bit twiddling algo. Note that a branch free version of the above code exists but it's much more verbose so I won't reproduce here.

那么让我们通过使用 pop 指令(计算位数)来改进解决方案:

So let's improve on the solution by using the pop instruction (counts the number of bits):

x = x | (x >> 1);
x = x | (x >> 2);
x = x | (x >> 4);
x = x | (x >> 8);
x = x | (x >>16);
return pop(~x);

那么这是如何工作的呢?关键是末尾的pop(~x) 指令,它计算x 中零的个数.为了使零的计数有意义,我们首先需要摆脱所有不领先的 0.我们通过使用二进制算法正确传播 1 来做到这一点.虽然我们仍然没有太多的指令级并行性,但我们确实摆脱了所有分支,并且它使用的周期比以前的解决方案少.好多了.

So how does this work? The key is the pop(~x) instruction at the end which counts the numbers of zeros in x. For the count of zeros to be meaningful we first need to get rid of all 0s which aren't leading. We do this by right propagating the 1s using a binary algorithm. While we still don't have much instruction-level parallelism, we did get rid of all the branches and it uses fewer cycles then the previous solution. Much better.

那么那个pop指令怎么样,这不是作弊吗?大多数架构都有一个 1 个周期的 pop 指令,可以通过编译器内置指令(例如 gcc 的 __builtin_pop)访问.否则,存在基于表的解决方案,但在为缓存访问权衡周期时必须小心,即使该表完全保留在 L1 缓存中.

So how about that pop instruction, isn't that cheating? Most architecture have a 1 cycle pop instruction which can be accessed via compiler builtins (eg. gcc's __builtin_pop). Otherwise there exist table based solutions but care must be taken when trading off cycles for cache accesses even if the table is kept entirely in the L1 cache.

最后,就像黑客通常喜欢的那样,我们开始在陌生的领域游荡.让我们用浮点数计算一些前导零:

Finally, as is usual for hacker's delight, we start wandering in strange territories. Let's count us some leading zeroes using floating point numbers:

union {
    unsigned asInt[2];
    double asDouble;
};
asDouble = (double)k + 0.5;
return 1054 - (asInt[LE] >> 20);

首先,有一点警告:不要使用这种算法.就标准而言,这会触发未定义的行为.这是为了有趣的因素而复制的,而不是任何实际用途.使用后果自负.

First off, a little warning: DON'T USE THIS ALGORITHM. This triggers undefined behaviour as far as the standard is concerned. This was reproduce for the fun factor more then any practical uses. Use at your own peril.

既然免责声明已经不存在了,它是如何运作的?它首先将 int 转换为 double 并继续提取 double 的指数分量.整洁的东西.如果在 little-endian 机器上执行,LE 常量应该是 1,在 big-endian 机器上执行时应该是 0.

Now that the disclaimers are out of the way, how does it work? It first converts the int into a double and goes on to extract the exponent component of the double. Neat stuff. The LE constant should be 1 if executed on a little-endian machine and 0 on a big-endian machine.

这应该让您简要了解针对此问题的各种位处理算法.请注意,这本书有几个变体,这些变体进行了各种权衡,但我会让您自己发现这些变体.

This should give you a brief survey of various bit twiddling algorithms for this problem. Note that the book has several variations of these which makes various tradeoffs but I'll let you discover those on your own.

这篇关于如何计算 32 位无符号整数中的前导零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆