覆盖 GetHashCode 的最佳算法是什么? [英] What is the best algorithm for overriding GetHashCode?

查看:24
本文介绍了覆盖 GetHashCode 的最佳算法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 .NET 中,GetHashCode 方法 用于整个 .NET 基类库的很多地方.正确实施它对于在集合中快速查找项目或在确定相等性时尤为重要.

In .NET, the GetHashCode method is used in a lot of places throughout the .NET base class libraries. Implementing it properly is especially important to find items quickly in a collection or when determining equality.

是否有关于如何为我的自定义类实现 GetHashCode 从而不降低性能的标准算法或最佳实践?

Is there a standard algorithm or best practice on how to implement GetHashCode for my custom classes so I don't degrade performance?

推荐答案

我通常采用类似于 Josh Bloch 的 fabulous 有效的 Java.它很快,并创建了一个不太可能导致冲突的非常好的散列.选择两个不同的素数,例如17 和 23,然后执行:

I usually go with something like the implementation given in Josh Bloch's fabulous Effective Java. It's fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:

public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

正如评论中所指出的,您可能会发现最好选择一个大素数进行乘法.显然 486187739 很好......尽管我见过的大多数小数例子都倾向于使用素数,但至少有类似的算法经常使用非素数.在后面的不完全FNV 示例中,例如,我使用了显然效果很好的数字 - 但初始值不是素数.(不过,乘法常数质数.我不知道它有多重要.)

As noted in comments, you may find it's better to pick a large prime to multiply by instead. Apparently 486187739 is good... and although most examples I've seen with small numbers tend to use primes, there are at least similar algorithms where non-prime numbers are often used. In the not-quite-FNV example later, for example, I've used numbers which apparently work well - but the initial value isn't a prime. (The multiplication constant is prime though. I don't know quite how important that is.)

这比 XOR 哈希码的常见做法要好,主要有两个原因.假设我们有一个带有两个 int 字段的类型:

This is better than the common practice of XORing hashcodes for two main reasons. Suppose we have a type with two int fields:

XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y

顺便说一下,较早的算法是 C# 编译器当前用于匿名类型的算法.

By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.

此页面提供了很多选项.我认为在大多数情况下,上述情况足够好"了.而且它非常容易记住和正确.FNV 替代方案同样简单,但使用不同的常量和 XOR 而不是 ADD 作为组合操作.它看起来有点类似于下面的代码,但是正常的 FNV 算法对单个字节进行操作,因此这需要修改以对每个字节执行一次迭代,而不是每个 32 位哈希值.FNV 还设计用于可变长度的数据,而我们在这里使用它的方式始终是针对相同数量的字段值.对此答案的评论表明,此处的代码实际上并不像上面的加法方法那样有效(在测试的示例案例中).

This page gives quite a few options. I think for most cases the above is "good enough" and it's incredibly easy to remember and get right. The FNV alternative is similarly simple, but uses different constants and XOR instead of ADD as a combining operation. It looks something like the code below, but the normal FNV algorithm operates on individual bytes, so this would require modifying to perform one iteration per byte, instead of per 32-bit hash value. FNV is also designed for variable lengths of data, whereas the way we're using it here is always for the same number of field values. Comments on this answer suggest that the code here doesn't actually work as well (in the sample case tested) as the addition approach above.

// Note: Not quite FNV!
public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = (int) 2166136261;
        // Suitable nullity checks etc, of course :)
        hash = (hash * 16777619) ^ field1.GetHashCode();
        hash = (hash * 16777619) ^ field2.GetHashCode();
        hash = (hash * 16777619) ^ field3.GetHashCode();
        return hash;
    }
}

请注意,需要注意的一件事是,理想情况下,您应该在将等式敏感(以及哈希码敏感)状态添加到依赖于哈希码的集合中后防止其更改.

Note that one thing to be aware of is that ideally you should prevent your equality-sensitive (and thus hashcode-sensitive) state from changing after adding it to a collection that depends on the hash code.

根据文档:

您可以为不可变引用类型覆盖 GetHashCode.通常,对于可变引用类型,您应该仅在以下情况下覆盖 GetHashCode:

You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:

  • 您可以从不可变的字段计算哈希码;或
  • 当可变对象包含在依赖于其哈希码的集合中时,您可以确保该对象的哈希码不会改变.

FNV 文章的链接已损坏,但这里有 Internet 上的副本存档:永远困惑 - 散列的艺术

The link to the FNV article is broken but here is a copy in the Internet Archive: Eternally Confuzzled - The Art of Hashing

这篇关于覆盖 GetHashCode 的最佳算法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆