std :: bitset的性能是什么? [英] What is the performance of std::bitset?

查看:376
本文介绍了std :: bitset的性能是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在程序员关于在 std :: bitset 上使用手动位操作的原始类型的原因。 从这个讨论中我得出结论,主要原因是表现,虽然我不知道这个意见的任何测量的基础。所以我的下一个问题是; 是否可能会因为使用 std :: bitset 而导致的性能损失(如果有)



这个问题是有意的广泛的,因为看了网上后,我找不到任何东西,所以我会把我可以得到的。基本上我是一个资源,提供一些分析的 std :: bitset vs'pre-bitset'替代相同的问题在一些常见的机器架构使用GCC,Clang和/或VC ++。有一个非常全面的论文,试图回答这个问题的位向量:



http://www.cs.up.ac.za/cs/vpieterse/pub/PieterseEtAl_SAICSIT2010.pdf



不幸的是,它或者早于或者被认为超出范围 std :: bitset ,所以它专注于向量/动态数组实现。



我真的只想知道 std :: bitset 是否比用例的替代方案更好解决。我已经知道,对整数进行位操作比较简单和更清晰,但是它是快速吗?

解决方案

我是其中一个会给你类似的性能答案,但我会尽力给你一些更深入的东西只是因为



bitset 的最大问题之一就是我通过实际的分析和时间安排遇到的问题。 $ c>和向量< bool> 是他们的界面设计是太方便。优化器很好地消除了您建立的所有结构,以提供安全性,降低维护成本,使更改更少干扰等。他们在选择指令和分配最小数量的寄存器以使这样的代码运行速度与不太安全,不那么容易维护/更改替代方案。



使bitet接口太方便,以效率为代价的部分是随机访问运算符[] 以及向量< bool> 的迭代器设计。当您在索引 n 下访问其中一个时,代码必须首先找出第n个位属于哪个字节,然后找出该索引中的位的子索引。第一阶段通常涉及到一个除法/ rshifts对一个价值比实际比特操作更昂贵的尝试执行。



迭代器设计向量< bool> 面对一个类似的尴尬困境,它要么必须分支到不同的代码每8次你迭代通过它或支付上述的那种索引成本。如果前者被完成,它使得逻辑在迭代中不对称,并且迭代器设计往往在这些罕见情况下取得性能损失。



优化器似乎无法优化去掉这个相位1字节的索引开销来确定要访问哪个字节(可能有点太依赖于运行时),并且您倾向于看到显着的性能增益,更多的手动代码处理位顺序与先进的知识,其工作的字节。这是一个不公平的比较,但是 std :: bitset 的困难是,没有办法做出公平的比较,在这种情况下,代码知道它想要什么字节提前访问,而且往往会提前获得这些信息。



如果界面设计涉及一个 bitset 其中 operator [] 返回一个 byte 代理, -index访问模式。例如,在这种情况下,您可以通过写 bitset [0] [7] = true; 访问位8一个好的优化器可能能够采取这样的设计,



另一个可能有帮助的设计是如果 bitsets

code>提供了一个 for_each_bit 方法,传递一个代理到你提供的函数。



std :: deque 有类似的界面问题。对于顺序访问,其性能不应慢于 std :: vector 。不幸的是,我们使用运算符[] (它被设计用于随机访问或通过迭代器)顺序访问它,并且deques的内部rep简单地不能非常有效地映射到迭代器的设计。如果deque提供了一个 for_each 类型的方法,那么它可能开始接近 std :: vector's 顺序访问性能。这些是一些罕见的情况,其中序列接口设计带有一些效率开销,优化器通常不能消除。通常,良好的优化器可以使生产构建中的运行时成本不受影响,但不幸的是不是在所有情况下。


I recently asked a question on Programmers regarding reasons to use manual bit manipulation of primitive types over std::bitset.

From that discussion I have concluded that the main reason is performance, although I'm not aware of any measured basis for this opinion. So my next question is; what is the performance hit, if any, likely to be incurred by using std::bitset over a primitive?

The question is intentionally broad, because after looking online I haven't been able to find anything, so I'll take what I can get. Basically I'm after a resource that provides some profiling of std::bitset vs 'pre-bitset' alternatives to the same problems on some common machine architecture using GCC, Clang and/or VC++. There is a very comprehensive paper which attemtps to answer this question for bit vectors:

http://www.cs.up.ac.za/cs/vpieterse/pub/PieterseEtAl_SAICSIT2010.pdf

Unfortunately it either predates or considered out of scope std::bitset, so it focuses on vectors/dynamic array implementations instead.

I really just want to know whether std::bitset is better than the alternatives for the use cases it is intended to solve. I already know that it is easier and clearer than bit-fiddling on an integer, but is it as fast?

解决方案

I'm one of those who would give you a similar performance answer, but I'll try to give you something a bit more in-depth than "just because". It is something I came across through actual profiling and timing, not merely distrust and paranoia.

One of the biggest problems with bitset and vector<bool> is that their interface design is "too convenient". Optimizers are great at obliterating all that structure you establish to provide safety, reduce maintenance cost, make changes less intrusive, etc. They do an especially fine job with selecting instructions and allocating the minimal number of registers to make such code run as fast as the not-so-safe, not-so-easy-to-maintain/change alternatives.

The part that makes the bitset interface "too convenient" at the cost of efficiency is the random-access operator[] as well as the iterator design for vector<bool>. When you access one of these at index n, the code has to first figure out which byte the nth bit belongs to, and then the sub-index to the bit within that. That first phase typically involves a division/rshifts against an lvalue which is more costly than the actual bit operation you're trying to perform.

The iterator design for vector<bool> faces a similar awkward dilemma where it either has to branch into different code every 8 times you iterate through it or pay that kind of indexing cost described above. If the former is done, it makes the logic asymmetrical across iterations, and iterator designs tend to take a performance hit in those rare cases.

Optimizers can't seem to optimize away this phase-1 byte indexing overhead to figure out which byte to access (perhaps a bit too runtime-dependent), and you tend to see significant performance gains with that more manual code processing bits sequentially with advanced knowledge of which byte it's working on. It's somewhat of an unfair comparison, but the difficulty with std::bitset is that there's no way to make a fair comparison in such cases where the code knows what byte it wants to access in advance, and more often than not, you tend to have this info in advance.

Perhaps that wouldn't be the case if the interface design involved a bitset where operator[] returned a byte proxy, requiring a two-index access pattern to use. For example, in such a case, you would access bit 8 by writing bitset[0][7] = true; A good optimizer may be able to take such a design and make it rival the manual, old school kind of way of doing the bit manipulation by hand.

Another design that might help is if bitsets provided a for_each_bit kind of method, passing a bit proxy to the functor you provide. That might actually be able to rival the manual method.

std::deque has a similar interface problem. Its performance shouldn't be that much slower than std::vector for sequential access. Yet unfortunately we access it sequentially using operator[] which is designed for random access or through an iterator, and the internal rep of deques simply don't map very efficiently to an iterator-based design. If deque provided a for_each kind of method of its own, then there it could potentially start to get a lot closer to std::vector's sequential access performance. These are some of the rare cases where that Sequence interface design comes with some efficiency overhead that optimizers often can't obliterate. Often good optimizers can make convenience come free of runtime cost in a production build, but unfortunately not in all cases.

这篇关于std :: bitset的性能是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆