C ++ - 为什么boost :: hash_combine是组合散列值的最佳方式? [英] C++ - Why is boost::hash_combine the best way to combine hash-values?

查看:179
本文介绍了C ++ - 为什么boost :: hash_combine是组合散列值的最佳方式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读过其他文章,这似乎是组合散列值的最佳方式。请问有人可以将其分解并解释为什么这是做这件事的最好方法吗?

 模板< class T> 
inline void hash_combine(std :: size_t& seed,const T& v)
{
std :: hash< T>散列器; (种子<< 6)+(种子>> 2);
种子^ = hasher(v)+ 0x9e3779b9 +

$ / code>

编辑:另一个问题只是要求幻数,我想知道整个功能,不仅仅是这个部分。 解决方案

它是最好的是议论文。 / p>

至少从表面上看,它很好甚至非常好,很容易。

<$ p $ (种子<< 6)+(种子>> 2);

我们假设 seed 是之前结果 hasher 或者这个算法。

^ = 意味着左边的位和右边的位都会改变结果的位。

hasher(v)被认为是 v 的体面散列。但其余的是防御,如果它不是一个体面的散列。

0x9e3779b9 是一个32位值(如果 size_t 是64位,则可以将其扩展为64位),其中包含半个0和半个1。它基本上是通过将特定的无理常数近似为基2固定点值完成的0和1的随机系列。这有助于确保如果hasher返回错误的值,我们仍然会在输出中得到1s和0s的模糊。

(seed< 6)+(种子>> 2)是传入种子的一点点洗牌。

常量缺失。想象一下,hasher几乎每传递一个 v 都会返回常量 0x01000 。现在,种子的每一位都被分散在散列的下一次迭代中,在散列期间再次散开。
$ b

种子^ =(种子< 6)+(种子>> 2)< c $ c>< code> 0x00001000 在一次迭代后变为 0x00041400 。然后 0x00859500 。当你重复这个操作时,任何设定位在输出位上被抹掉。最终左右位碰撞,并且进位将设置位从偶数位置移动到奇数位置。



位取决于输入种子的值随着联合操作在种子操作中递归,增长得相对快速且复杂。增加原因进行,这更多地拖曳事物。 0x 常量会添加一堆伪随机比特,这些比特会使无聊哈希值在合并后占用超过哈希空间的几个位。



由于加法(结合doggod给出了不同的结果),它处理无聊的散列值(将字符映射到它们的ASCII值,这只涉及少量的位)。而且速度相当快。

在其他情况下,加密强度较低的散列组合可能会更好。我天真地认为,将这些转换组合成偶数和奇数转换组合可能是一个好主意(但是也许增加,即使偶数比特移动比特,也不会有问题:经过3次迭代后,传入的孤立种子这些分析将会发生碰撞并添加并导致进位)。

这种分析的不足之处在于,只有一个错误才能使散列函数变得非常糟糕。指出所有的好东西并没有太大的帮助。所以现在另一件让它变得更好的事情是它在开源资源库中是有名的,而且我没有听到有人指出它为什么坏。


I've read in other posts that this seems to be the best way to combine hash-values. Could somebody please break this down and explain why this is the best way to do it?

template <class T>
inline void hash_combine(std::size_t& seed, const T& v)
{
    std::hash<T> hasher;
    seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

Edit: The other question is only asking for the magic number, but I'd like to get know about the whole function, not only this part.

解决方案

It being the "best" is argumentative.

It being "good", or even "very good", at least superficially, is easy.

seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);

We'll presume seed is a previous result of hasher or this algorithm.

^= means that the bits on the left and bits on the right all change the bits of the result.

hasher(v) is presumed to be a decent hash on v. But the rest is defence in case it isn't a decent hash.

0x9e3779b9 is a 32 bit value (it could be extended to 64 bit if size_t was 64 bit arguably) that contains half 0s and half 1s. It is basically a random series of 0s and 1s done by approximating particular irrational constant as a base-2 fixed point value. This helps ensure that if the hasher returns bad values, we still get a smear of 1s and 0s in our output.

(seed<<6) + (seed>>2) is a bit shuffle of the incoming seed.

Imagine the 0x constant was missing. Imagine the hasher returns the constant 0x01000 for almost every v passed in. Now, each bit of the seed is spread out over the next iteration of the hash, during which it is again spread out.

The seed ^= (seed<<6) + (seed>>2) 0x00001000 becomes 0x00041400 after one iteration. Then 0x00859500. As you repeat the operation, any set bits are "smeared out" over the output bits. Eventually the right and left bits collide, and carry moves the set bit from "even locations" to "odd locations".

The bits dependent on the value of an input seed grows relatively fast and in complex ways as the combine operation recurses on the seed operation. Adding causes carries, which smear things even more. The 0x constant adds a bunch of pseudo-random bits that make boring hash values occupy more than a few bits of the hash space after being combined.

It is asymmetric thanks to addition (combining the hashes of "dog" and "god" gives different results), it handles boring hash values (mapping characters to their ascii value, which only involves twiddling a handful of bits). And, it is reasonably fast.

Slower hash combines that are cryptographically strong can be better in other situations. I, naively, would presume that making the shifts be a combination of even and odd shifts might be a good idea (but maybe addition, which moves even bits from odd bits, makes that less of a problem: after 3 iterations, incoming lone seed bits will collide and add and cause a carry).

The downside to this kind of analysis is that it only takes one mistake to make a hash function really bad. Pointing out all the good things doesn't help that much. So another thing that makes it good now is that it is reasonably famous and in an open-source repository, and I haven't heard anyone point out why it is bad.

这篇关于C ++ - 为什么boost :: hash_combine是组合散列值的最佳方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆