C++ - 为什么 boost::hash_combine 是组合散列值的最佳方式? [英] C++ - Why is boost::hash_combine the best way to combine hash-values?

查看:114
本文介绍了C++ - 为什么 boost::hash_combine 是组合散列值的最佳方式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在其他帖子中读到这似乎是组合散列值的最佳方式.有人可以分解一下并解释为什么这是最好的方法吗?

I've read in other posts that this seems to be the best way to combine hash-values. Could somebody please break this down and explain why this is the best way to do it?

template <class T>
inline void hash_combine(std::size_t& seed, const T& v)
{
    std::hash<T> hasher;
    seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

另一个问题只是要求幻数,但我想了解整个功能,而不仅仅是这部分.

The other question is only asking for the magic number, but I'd like to get know about the whole function, not only this part.

推荐答案

最好"是有争议的.

好",甚至非常好",至少在表面上,很容易.

It being "good", or even "very good", at least superficially, is easy.

seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);

我们假设 seedhasher 或这个算法的先前结果.

We'll presume seed is a previous result of hasher or this algorithm.

^= 表示左边的位和右边的位都改变了结果的位.

^= means that the bits on the left and bits on the right all change the bits of the result.

hasher(v) 被认为是 v 上的一个不错的哈希值.但剩下的就是防御,以防它不是一个像样的哈希.

hasher(v) is presumed to be a decent hash on v. But the rest is defence in case it isn't a decent hash.

0x9e3779b9 是一个包含半个 0 和半个 1 的 32 位值(如果 size_t 可以说是 64 位,则可以扩展到 64 位).它基本上是通过将特定的无理常数近似为基数为 2 的定点值来完成的 0 和 1 的随机序列.这有助于确保如果哈希器返回错误值,我们的输出中仍然会出现 1 和 0 的污点.

0x9e3779b9 is a 32 bit value (it could be extended to 64 bit if size_t was 64 bit arguably) that contains half 0s and half 1s. It is basically a random series of 0s and 1s done by approximating particular irrational constant as a base-2 fixed point value. This helps ensure that if the hasher returns bad values, we still get a smear of 1s and 0s in our output.

(seed<<6) + (seed>>2) 是对传入种子的一点洗牌.

(seed<<6) + (seed>>2) is a bit shuffle of the incoming seed.

想象一下 0x 常量丢失了.想象一下,哈希器几乎为传入的每个 v 返回常量 0x01000.现在,种子的每一位都分布在哈希的下一次迭代中,在此期间,再次散开.

Imagine the 0x constant was missing. Imagine the hasher returns the constant 0x01000 for almost every v passed in. Now, each bit of the seed is spread out over the next iteration of the hash, during which it is again spread out.

seed ^= (seed<<6) + (seed>>2) 0x00001000 在一次迭代后变为 0x00041400.然后0x00859500.当您重复操作时,任何设置位都会在输出位上涂抹".最终左右位发生冲突,进位将设置位从偶数位置"移动到奇数位置".

The seed ^= (seed<<6) + (seed>>2) 0x00001000 becomes 0x00041400 after one iteration. Then 0x00859500. As you repeat the operation, any set bits are "smeared out" over the output bits. Eventually the right and left bits collide, and carry moves the set bit from "even locations" to "odd locations".

当合并操作在种子操作上递归时,依赖于输入种子值的位增长相对较快且以复杂的方式增长.添加原因进行,这将更多地抹黑.0x 常量添加了一堆伪随机位,使得无聊的哈希值组合后占用的哈希空间不止几位.

The bits dependent on the value of an input seed grows relatively fast and in complex ways as the combine operation recurses on the seed operation. Adding causes carries, which smear things even more. The 0x constant adds a bunch of pseudo-random bits that make boring hash values occupy more than a few bits of the hash space after being combined.

由于加法,它是不对称的(结合 "dog""god" 的散列给出不同的结果),它处理无聊的散列值(将字符映射到它们的ascii 值,只涉及处理少量位).而且,它相当快.

It is asymmetric thanks to addition (combining the hashes of "dog" and "god" gives different results), it handles boring hash values (mapping characters to their ascii value, which only involves twiddling a handful of bits). And, it is reasonably fast.

加密强度较慢的哈希组合在其他情况下可能会更好.我天真地认为,使移位成为偶数和奇数移位的组合可能是一个好主意(但也许加法,从奇数位移动偶数位,使问题不那么严重:3 次迭代后,传入的孤种子位将发生冲突并添加并导致进位).

Slower hash combines that are cryptographically strong can be better in other situations. I, naively, would presume that making the shifts be a combination of even and odd shifts might be a good idea (but maybe addition, which moves even bits from odd bits, makes that less of a problem: after 3 iterations, incoming lone seed bits will collide and add and cause a carry).

这种分析的缺点是,只要犯一个错误就会使散列函数变得非常糟糕.指出所有的好东西并没有多大帮助.所以现在使它变得更好的另一件事是它相当有名并且在一个开源存储库中,而且我还没有听到任何人指出它为什么不好.

The downside to this kind of analysis is that it only takes one mistake to make a hash function really bad. Pointing out all the good things doesn't help that much. So another thing that makes it good now is that it is reasonably famous and in an open-source repository, and I haven't heard anyone point out why it is bad.

这篇关于C++ - 为什么 boost::hash_combine 是组合散列值的最佳方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆