ValueType.GetHashCode 的本机实现如何工作? [英] How does native implementation of ValueType.GetHashCode work?

查看:13
本文介绍了ValueType.GetHashCode 的本机实现如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了两个 TheKey 类型的结构 k1={17,1375984} 和 k2={17,1593144}.显然,第二个字段中的指针是不同的.但两者都得到相同的哈希码=346948941.预计会看到不同的哈希码.请参阅下面的代码.

I created two structures of TheKey type k1={17,1375984} and k2={17,1593144}. Obviosly the pointers in the second fields are different. But both get same hash code=346948941. Expected to see different hash codes. See the code below.

struct TheKey
{
    public int id;
    public string Name;

    public TheKey(int id, string name)
    {
       this.id = id;
       Name = name;
   }
}

static void Main() {
    // assign two different strings to avoid interning
    var k1 = new TheKey(17, "abc");
    var k2 = new TheKey(17, new string(new[] { 'a', 'b', 'c' }));

    Dump(k1); // prints the layout of a structure
    Dump(k2);

    Console.WriteLine("hash1={0}", k1.GetHashCode());
    Console.WriteLine("hash2={0}", k2.GetHashCode());
}

unsafe static void Dump<T>(T s) where T : struct
{
    byte[] b = new byte[8];
    fixed (byte* pb = &b[0])
    {
        IntPtr ptr = new IntPtr(pb);
        Marshal.StructureToPtr(s, ptr, true);

        int* p1 = (int*)(&pb[0]); // first 32 bits
        int* p2 = (int*)(&pb[4]);

        Console.WriteLine("{0}", *p1);
        Console.WriteLine("{0}", *p2);
    }
}

输出:
17
1375984
17
1593144
hash1=346948941
hash2=346948941

Output:
17
1375984
17
1593144
hash1=346948941
hash2=346948941

推荐答案

它比我们看到的要复杂得多.首先,给 key2 值一个完全不同的字符串.注意哈希码是如何保持不变的:

It is a lot more complicated than meets the eye. For starters, give the key2 value a completely different string. Notice how the hash code is still the same:

    var k1 = new TheKey(17, "abc");
    var k2 = new TheKey(17, "def");
    System.Diagnostics.Debug.Assert(k1.GetHashCode() == k2.GetHashCode());

这是非常有效的,对哈希码的唯一要求是相同的值产生相同的哈希码.不同的 值不必产生不同的哈希码.这在物理上是不可能的,因为 .NET 哈希码只能表示 40 亿个不同的值.

Which is quite valid, the only requirement for a hash code is that the same value produces the same hash code. Different values don't have to produce different hash codes. That's not physically possible since a .NET hash code can only represent 4 billion distinct values.

计算结构体的哈希码是一件棘手的事情.CLR 做的第一件事是检查结构是否包含任何引用类型引用或字段之间是否存在间隙.参考值需要特殊处理,因为参考值是随机的.它是一个指针,当垃圾收集器压缩堆时,其值会发生变化.由于对齐,在结构布局中产生了间隙.带有一个字节和一个 int 的结构体在两个字段之间有 3 个字节的间隙.

Calculating the hash code for a struct is tricky business. The first thing the CLR does is check if the structure contains any reference type references or has gaps between the fields. A reference requires special treatment because the reference value is random. It is a pointer whose value changes when the garbage collector compacts the heap. Gaps in the structure layout are created because of alignment. A struct with a byte and an int has a 3 byte gap between the two fields.

如果两者都不是,则结构值中的所有位都是有效的.CLR 通过一次异或 32 个位来快速计算散列.这是一个好"的哈希,结构体中的所有字段都参与哈希码.

If neither is the case then all the bits in the structure value are significant. The CLR quickly calculates the hash by xor-ing the bits, 32 at a time. This is a 'good' hash, all the fields in the struct participate in the hash code.

如果结构具有引用类型的字段或有间隙,则需要另一种方法.CLR 迭代结构体的字段并寻找可用于生成散列的字段.一个可用的字段是值类型的字段或非空的对象引用.一旦找到,它就会获取该字段的散列值,并使用方法表指针对它进行异或运算并退出.

If the struct has fields of a reference type or has gaps then another approach is needed. The CLR iterates the fields of the struct and goes looking for one that is usable to generate a hash. A usable one is a field of a value type or an object reference that isn't null. As soon as it finds one, it takes the hash of that field, xors it with the method table pointer and quits.

换句话说,结构中只有一个字段参与哈希码计算.这是您的情况,仅使用 id 字段.这就是字符串成员值无关紧要的原因.

In other words, only one field in the structure participates in the hash code calculation. Which is your case, only the id field is used. Which is why the string member value doesn't matter.

这是一个晦涩的事实,如果您将其留给 CLR 来为结构生成哈希码,则显然需要注意这一点.到目前为止,最好的办法就是永远不要这样做.如果必须,请确保对结构中的字段进行排序,以便第一个字段为您提供最佳哈希码.在您的情况下,只需交换 idName 字段.

This is an obscure factoid that's obviously important to be aware of if you ever leave it up to the CLR to generate hash codes for a struct. By far the best thing to do is to just never do this. If you have to, then be sure to order the fields in the struct so that the first field gives you the best hash code. In your case, just swap the id and Name fields.

另一个有趣的花絮,好"的哈希计算代码有一个错误.当结构包含 System.Decimal 时,它将使用快速算法.问题是,十进制的位并不代表其数值.试试这个:

Another interesting tidbit, the 'good' hash calculation code has a bug. It will use the fast algorithm when the structure contains a System.Decimal. Problem is, the bits of a Decimal are not representative for its numeric value. Try this:

struct Test { public decimal value; }

static void Main() {
    var t1 = new Test() { value = 1.0m };
    var t2 = new Test() { value = 1.00m };
    if (t1.GetHashCode() != t2.GetHashCode())
        Console.WriteLine("gack!");
}

这篇关于ValueType.GetHashCode 的本机实现如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆