如何本地实现ValueType.GetHash code的工作? [英] How does native implementation of ValueType.GetHashCode work?

查看:109
本文介绍了如何本地实现ValueType.GetHash code的工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建 TheKey两种结构键入K1 = {} 17,1375984和k2 = {} 17,1593144。
Obviosly在第二场中的指针是不同的。但都得到相同的散列code = 346948941。
希望看到不同的hash codeS。请参阅下面的code。

I created two structures of TheKey type k1={17,1375984} and k2={17,1593144}. Obviosly the pointers in the second fields are different. But both get same hash code=346948941. Expected to see different hash codes. See the code below.

struct TheKey
{
    public int id;
    public string Name;

    public TheKey(int id, string name)
    {
       this.id = id;
       Name = name;
   }
}

static void Main() {
    // assign two different strings to avoid interning
    var k1 = new TheKey(17, "abc");
    var k2 = new TheKey(17, new string(new[] { 'a', 'b', 'c' }));

    Dump(k1); // prints the layout of a structure
    Dump(k2);

    Console.WriteLine("hash1={0}", k1.GetHashCode());
    Console.WriteLine("hash2={0}", k2.GetHashCode());
}

unsafe static void Dump<T>(T s) where T : struct
{
    byte[] b = new byte[8];
    fixed (byte* pb = &b[0])
    {
        IntPtr ptr = new IntPtr(pb);
        Marshal.StructureToPtr(s, ptr, true);

        int* p1 = (int*)(&pb[0]); // first 32 bits
        int* p2 = (int*)(&pb[4]);

        Console.WriteLine("{0}", *p1);
        Console.WriteLine("{0}", *p2);
    }
}

输出:

17

1375984

17

1593144

HASH1 = 346948941

HASH2 = 346948941

Output:
17
1375984
17
1593144
hash1=346948941
hash2=346948941

推荐答案

这是一个复杂得多比满足眼睛。对于初学者来说,给KEY2值完全不同的字符串。注意哈希code如何仍然是相同的:

It is a lot more complicated than meets the eye. For starters, give the key2 value a completely different string. Notice how the hash code is still the same:

    var k1 = new TheKey(17, "abc");
    var k2 = new TheKey(17, "def");
    System.Diagnostics.Debug.Assert(k1.GetHashCode() == k2.GetHashCode());

这是相当有效的,对于一个散列code中的唯一的要求是相同的值产生相同的散列code。的不同的的值没有产生不同的hash codeS。这是不是身体可能因为.NET哈希code只能重新present 4个十亿不同的值。

Which is quite valid, the only requirement for a hash code is that the same value produces the same hash code. Different values don't have to produce different hash codes. That's not physically possible since a .NET hash code can only represent 4 billion distinct values.

计算散列code的结构是件棘手的事。在CLR做的第一件事是检查,如果该结构包含任何引用类型的引用,或有领域之间的差距。参考需要特殊对待,因为参考值是随机的。这是一个指针,它的值时,垃圾收集器压缩堆变化。在结构布局的差距,因为定位的创建。用一个字节和一个int类型的结构有两个字段之间的3个字节的空白。

Calculating the hash code for a struct is tricky business. The first thing the CLR does is check if the structure contains any reference type references or has gaps between the fields. A reference requires special treatment because the reference value is random. It is a pointer whose value changes when the garbage collector compacts the heap. Gaps in the structure layout are created because of alignment. A struct with a byte and an int has a 3 byte gap between the two fields.

如果既不是这种情况,那么在结构值的所有的比特是显著。在CLR很快被异或位,32在时间来计算哈希值。这是一个好的哈希,在结构中的所有字段参与散列code。

If neither is the case then all the bits in the structure value are significant. The CLR quickly calculates the hash by xor-ing the bits, 32 at a time. This is a 'good' hash, all the fields in the struct participate in the hash code.

如果该结构具有引用类型的字段或有缺口则需要另一种方法。在CLR遍历该结构的领域,去寻找一个可用于生成一个散列。对可以使用的一个是值类型或对象引用不是空的领域。只要找到一个,它需要该字段的哈希,该方法表指针的异或,并退出

If the struct has fields of a reference type or has gaps then another approach is needed. The CLR iterates the fields of the struct and goes looking for one that is usable to generate a hash. A usable one is a field of a value type or an object reference that isn't null. As soon as it finds one, it takes the hash of that field, xors it with the method table pointer and quits.

在换句话说,只有在结构中的一个的字段参加散列code的计算。哪个是你的情况下,只有的标识的使用领域。这就是为什么字符串成员值并不重要。

In other words, only one field in the structure participates in the hash code calculation. Which is your case, only the id field is used. Which is why the string member value doesn't matter.

这是一个不起眼的factoid要知道,如果你离开它的CLR生成散列codeS的结构这显然是非常重要的。到目前为止做的最好的事情是从来没有做到这一点。如果你有,那么一定要责令领域的结构,使得第一场为您提供最佳的散列code。在你的情况,只是交换的标识名称的领域。

This is an obscure factoid that's obviously important to be aware of if you ever leave it up to the CLR to generate hash codes for a struct. By far the best thing to do is to just never do this. If you have to, then be sure to order the fields in the struct so that the first field gives you the best hash code. In your case, just swap the id and Name fields.

另一个有趣的花絮,好哈希计算code有一个bug。当该结构中包含一个System.Decimal它将使用快速算法。问题是,一个十进制的比特不重新presentative其数值。试试这个:

Another interesting tidbit, the 'good' hash calculation code has a bug. It will use the fast algorithm when the structure contains a System.Decimal. Problem is, the bits of a Decimal are not representative for its numeric value. Try this:

struct Test { public decimal value; }

static void Main() {
    var t1 = new Test() { value = 1.0m };
    var t2 = new Test() { value = 1.00m };
    if (t1.GetHashCode() != t2.GetHashCode())
        Console.WriteLine("gack!");
}

这篇关于如何本地实现ValueType.GetHash code的工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆