为什么ValueType.GetHash code()实施喜欢它? [英] Why is ValueType.GetHashCode() implemented like it is?

查看:108
本文介绍了为什么ValueType.GetHash code()实施喜欢它?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ValueType.cs


**Action: Our algorithm for returning the hashcode is a little bit complex. We look 
**        for the first non-static field and get it's hashcode.  If the type has no 
**        non-static fields, we return the hashcode of the type. We can't take the
**        hashcode of a static member because if that member is of the same type as 
**        the original type, we'll end up in an infinite loop.

我得到今天这个当我使用一个KeyValuePair作为字典的关键咬伤(它存储的XML属性的名称(枚举)和它的值(字符串)),并预期它有它的哈希code计算基于各领域,但根据实施它只考虑的关键组成部分。

I got bitten by this today when I was using a KeyValuePair as a key in a Dictionary (it stored xml attribute name (enum) and it's value (string)), and expected for it to have it's hashcode computed based on all its fields, but according to implementation it only considered the Key part.

例(C从Linqpad / P):

Example (c/p from Linqpad):

void Main()
{
    var kvp1 = new KeyValuePair<string, string>("foo", "bar");
    var kvp2 = new KeyValuePair<string, string>("foo", "baz");

    // true
    (kvp1.GetHashCode() == kvp2.GetHashCode()).Dump();
}

第一个非静态字段我猜是指为了declaratin第一场,这也可能导致源改变时,变量顺序无论出于何种原因,并相信它不会语义改变code麻烦。

The first non-static field I guess means the first field in declaratin order, which could also cause trouble when changing variable order in source for whatever reason, and believing it doesn't change the code semantically.

推荐答案

我并没有实现它,我还没有谈到谁做的人。但是我可以指出一些事情。

I didn't implement it and I haven't talked to the people who did. But I can point out a few things.

(我去之前,注意这里我特别谈到哈希codeS平衡其中表的内容是由非敌对的用户选择哈希表的目的。散列$ C $的问题CS进行数字签名,冗余校验,或确保当一些用户正在安装对表提供拒绝服务攻击的哈希表的良好表现已经超出本文的讨论范围之内。)

(Before I go on, note that here I am specifically talking about hash codes for the purposes of balancing hash tables where the contents of the table are chosen by non-hostile users. The problems of hash codes for digital signing, redundancy checking, or ensuring good performance of a hash table when some of the users are mounting denial-of-service attacks against the table provider are beyond the scope of this discussion.)

首先,乔恩正确地指出,给出的算法并实现GetHash code的必需的合同。它可能是次优的你的目的,但它是合法的。所有这一切都是的需要的是,事情比较平等享有平等的哈希codeS。

First, as Jon correctly notes, the given algorithm does implement the required contract of GetHashCode. It might be sub-optimal for your purposes, but it is legal. All that is required is that things that compare equal have equal hash codes.

那么,什么是不错的富人除了这份合同吗?一个好的哈希code实现的应该是:

So what are the "nice to haves" in addition to that contract? A good hash code implementation should be:

1)快速。非常快!请记住,散列code在首位的整点是的快速的发现在哈希表中一个比较空的插槽。如果散列code的O(1)计算在实践中比将O慢(N)取天真地做查找时间,那么哈希code解决方案是一个净损失。

1) Fast. Very fast! Remember, the whole point of the hash code in the first place is to rapidly find a relatively empty slot in a hash table. If the O(1) computation of the hash code is in practice slower than the O(n) time taken to do the lookup naively then the hash code solution is a net loss.

2)好了分布在32位整数的空间,投入的给定分布。更糟的跨越整数分布情况,更像是一个天真的线性查找哈希表将是

2) Well distributed across the space of 32 bit integers for the given distribution of inputs. The worse the distribution across the ints, the more like a naive linear lookup the hash table is going to be.

那么,如何将使得所这两个的发生冲突的目标,为任意值类型的哈希算法?你,保证良好的分布复杂的哈希算法花费任何时间,时间不好度过的。

So, how would you make a hash algorithm for arbitrary value types given those two conflicting goals? Any time you spend on a complex hash algorithm that guarantees good distribution is time poorly spent.

一个共同的建议是散列所有字段,然后一起异或结果散列codeS。但是,这是乞讨的问题;异或两个32位整数只给出了良好的销售时,输入自己是非常均匀分布,而不是相互关联的,这是一个不太可能的情况:

A common suggestion is "hash all of the fields and then XOR together the resulting hash codes". But that is begging the question; XORing two 32 bit ints only gives good distribution when the inputs themselves are extremely well-distributed and not related to each other, and that is an unlikely scenario:

// (Updated example based on good comment!)
struct Control
{
    string name;
    int x;
    int y;
}

这是X和Y是公分布的32位的整数的整个范围内的可能性有多大?非常低。赔率要好得多,它们都是的和的彼此靠近的,在这种情况下,异或他们的哈希codeS一起使事情更糟的不是的更好的。异或那些彼此接近零了大部分的比特整数在一起。

What is the likelihood that x and y are well-distributed over the entire range of 32 bit integers? Very low. Odds are much better that they are both small and close to each other, in which case xoring their hash codes together makes things worse, not better. xoring together integers that are close to each other zeros out most of the bits.

此外,这是在字段数为O(n)!值类型有很多小的领域将需要一个较长的时间来计算哈希code。

Furthermore, this is O(n) in the number of fields! A value type with a lot of small fields would take a comparatively long time to compute the hash code.

基本上我们在这里的情况是,用户没有提供一个哈希code实现自己;他们要么不关心,或者他们不希望这种类型的永远被用来作为哈希表的键。既然你的没有语义信息任何的有关类型,什么是最好的事是什么?做最好的事情是什么速度快,给人的大部分时间的好成绩。

Basically the situation we're in here is that the user didn't provide a hash code implementation themselves; either they don't care, or they don't expect this type to ever be used as a key in a hash table. Given that you have no semantic information whatsoever about the type, what's the best thing to do? The best thing to do is whatever is fast and gives good results most of the time.

这是不同将在不同的大部分时间里,两名结构情况下的的大部分领域的,而不是仅仅的他们的一个领域的,所以只选择其中之一,并希望那它是不同似乎是合理的人。

Most of the time, two struct instances that differ will differ in most of their fields, not just one of their fields, so just picking one of them and hoping that it's the one that differs seems reasonable.

这不同将在各自的领域一些冗余,所以结合许多领域的散列值一起很可能降低在大多数情况下,二结构的情况下,不增加,熵中的散列值,即使它消耗的时间散列算法旨在节省。

Most of the time, two struct instances that differ will have some redundancy in their fields, so combining the hash values of many fields together is likely to decrease, not increase, the entropy in the hash value, even as it consumes the time that the hash algorithm is designed to save.

使用匿名类型在C#中的设计进行比较。与匿名类型我们的知道,这是极有可能的类型被用作一键的表。我们的的知道,这极有可能会出现跨越匿名类型的实例冗余(因为他们是笛卡尔乘积的结果或其他连接)。因此,我们结合做的所有字段的散列codeS到一个哈希code。如果让你表现不好,由于人数过多哈希codeS被计算的,您可以自由使用自定义的标称类型,而不是匿名类型。

Compare this with the design of anonymous types in C#. With anonymous types we do know that it is highly likely that the type is being used as a key to a table. We do know that it is highly likely that there will be redundancy across instances of anonymous types (because they are results of a cartesian product or other join). And therefore we do combine the hash codes of all of the fields into one hash code. If that gives you bad performance due to the excess number of hash codes being computed, you are free to use a custom nominal type rather than the anonymous type.

这篇关于为什么ValueType.GetHash code()实施喜欢它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆