可以object.GetHash code()产生不同的结果在不同的机器相同的对象(字符串)? [英] Can object.GetHashCode() produce different results for the same objects (strings) on different machines?

查看:342
本文介绍了可以object.GetHash code()产生不同的结果在不同的机器相同的对象(字符串)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有可能同一个对象,特别是字符串或任何原始的或者非常简单的类型(如结构),在不同的机器调用时产生的 .GetHash code()方法的不同值?

Is it possible one and the same object, particularly a string or any primitive or very simple type (like a struct), to produce different values of the .GetHashCode() method when invoked on different machines?

例如,是否有可能为前pression 的Hello World.GetHash code()来产生不同的机器上的一个不同的值。我主要是要求C#.NET,但我想这可能适用于Java,甚至其他语言?

For instance, is it possible for the expression "Hello World".GetHashCode() to produce a different value on a different machine. I am primarily asking for C#.NET but I suppose this might apply to Java or even other languages?

编辑:

正如从答案和下面的评论,这是众所周知的我, .GetHash code()重写,然后但不保证针对它产生不同版本的框架之间的结果。因此,必须澄清的是,我有简单的类型记(不能被继承,所以 GetHash code()被覆盖)是很重要的,我现在用的是同一个版本对所有机器的框架。

As pointed from answers and comments below, it is known to me that .GetHashCode() can be overriden, and there is no guarantee for the result it produces between different version of the framework. Therefore it is important to clarify that I have simple types in mind (which cannot be inherited, therefore GetHashCode() be overriden) and I am using the same versions of the framework on all machines.

推荐答案

简短的回答:是的,

但简短的回答是没有乐趣,不是吗?

But short answers are no fun, are they?

当您正在实施 GetHash code()你必须做出以下保证:

When you are implementing GetHashCode() you have to make the following guarantee:

GetHash code()被称为另一个对象,应考虑等于这个,这个应用程序域,相同的值将被退回。

When GetHashCode() is called on another object that should be considered equal to this, in this App Domain, the same value will be returned.

就是这样。有一些事情你真的需要尽量做到(S $ P $垫位周围尽可能不相等的对象,但不要这么长的时间呢,它胜过散列摆在首位的所有优点)和你的code会吸,如果你不这样做,但它实际上并不会打破。这将打破,如果你不走那么远,因为那样的话,例如:

That's it. There's some things you really need to try to do (spread the bits around with non-equal objects as much as possible, but don't take so long about it that it outweighs all the benefits of hashing in the first place) and your code will suck if you don't do so, but it won't actually break. It will break if you don't go that far, because then e.g.:

dict[myObj] = 3;
int x = dict[myObj];//KeyNotFoundException

好。如果我实施 GetHash code(),为什么我会进一步去比,为什么我不可?

Okay. If I'm implementing GetHashCode(), why might I go further than that, and why might I not?

一,为什么我不可?

也许这是一个略有不同版本的程序集和我改善(或至少试图)之间建立。

Maybe it's a slightly different version of the assembly and I improved (or at least attempted to) in between builds.

也许一个是32位,一个是64位和散列喜欢收藏的对象时,我要疯了效率和选择了不同的算法每次尽量使用不同的字大小(这是不是闻所未闻的,特别是或字符串)。

Maybe one is 32-bit and one is 64-bit and I was going nuts for efficiency and chose a different algorithm for each to make use of the different word sizes (this is not unheard of, especially when hashing objects like collections or strings).

也许有些元素,我决定在决定什么是平等的对象,要考虑的是自己从系统到不同的系统中这种方法。

Maybe some element I'm deciding to consider in deciding on what constitutes "equal" objects is itself varying from system to system in this sort of way.

也许我确实有意引入不同的种子有不同的构建,以捕捉任何情况下,一个同事被错误地取决于我的哈希值code! (我听说MS这样与他们实施 string.GetHash code(),但不记得我是否听说从一个可信或轻信源)。

Maybe I actually deliberately introduce a different seed with different builds to catch any case where a colleague is mistakenly depending upon my hash code! (I've heard MS do this with their implementation for string.GetHashCode(), but can't remember whether I heard that from a credible or credulous source).

主要虽然,这将是前两个原因之一。

Mainly though, it'll be one of the first two reasons.

现在,我为什么会作出这样的保证?

Now, why might I give such a guarantee?

最有可能的,如果我这样做,这将是一次偶然的机会。如果一个元素可以比单独一个整数ID的基础上的平等,那么这就是我要为我的散列code使用。还有什么将是一个不太好的哈希更多的工作。我不太可能改变这一点,所以我可能。

Most likely if I do, it'll be by chance. If an element can be compared for equality on the basis of a single integer id alone, then that's what I'm going to use as my hash-code. Anything else will be more work for a less good hash. I'm not likely to change this, so I might.

另外我之所以可能,是我想要保证自己。还有什么可说的,我不能提供它,只是我没有。

The other reason why I might, is that I want that guarantee myself. There's nothing to say I can't provide it, just that I don't have to.

好了,让我们到一些实事。有些情况下,你可能需要一个独立于机器的保证。在某些情况下,您可能希望相反的,我会来了位。

Okay, let's get to something practical. There are cases where you may want a machine-independent guarantee. There are cases where you may want the opposite, which I'll come to in a bit.

首先,检查你的逻辑。你能处理冲突?好,那我们就开始吧。

First, check your logic. Can you handle collisions? Good, then we'll begin.

如果这是你自己的类,然后实现,以便提供这样的保证,将其记录下来,你就大功告成了。

If it's your own class, then implement so as to provide such a guarantee, document it, and you're done.

如果这不是你的类,然后实现的IEqualityComparer< T> 以这样的方式来提供。例如:

If it's not your class, then implement IEqualityComparer<T> in such a way as to provide it. For example:

public class ConsistentGuaranteedComparer : IEqualityComparer<string>
{
  public bool Equals(string x, string y)
  {
    return x == y;
  }
  public int GetHashCode(string obj)
  {
    if(obj == null)
      return 0;
    int hash = obj.Length;
    for(int i = 0; i != obj.Length; ++i)
      hash = hash << 5 - hash + obj[i];
    return hash;
  }
}

然后使用,而不是内置的散列code。

Then use this instead of the built-in hash-code.

还有一个有趣的情况下,我们可能想的正好相反。如果我能控制的设置你是散列字符串,那么我可以挑一束具有相同散列code字符串。您的基于散列的集合的业绩将达到最坏情况下,成为pretty的残暴。没准我能保持这样的速度比你可以处理它,所以它可以是一个拒绝服务攻击。这里没有许多情况下,出现这种情况,但是很重要的是,如果你正在处理XML文档,我送你不能只是排除一些元素(很多格式允许的元素在其中自由)。那么 NameTable 你的分析器内会受到伤害。在这种情况下,我们创建了一个新的散列机制每次:

There's an interesting case where we may want the opposite. If I can control the set of strings you are hashing, then I can pick a bunch of strings with the same hash-code. Your hash-based collection's performance will hit the worse-case and be pretty atrocious. Chances are I can keep doing this faster than you can deal with it, so it can be a denial of service attack. There's not many cases where this happens, but an important one is if you're handling XML documents I send and you can't just rule out some elements (a lot of formats allow for freedom of elements within them). Then the NameTable inside your parser will be hurt. In this case we create a new hash mechanism each time:

public class RandomComparer : IEqualityComparer<string>
{
  private int hashSeed = Environment.TickCount;
  public bool Equals(string x, string y)
  {
    return x == y;
  }
  public int GetHashCode(string obj)
  {
    if(obj == null)
      return 0;
    int hash = hashSeed + obj.Length;
    for(int i = 0; i != obj.Length; ++i)
      hash = hash << 5 - hash + obj[i];
    hash += (hash <<  15) ^ 0xffffcd7d;
    hash ^= (hash >>> 10);
    hash += (hash <<   3);
    hash ^= (hash >>>  6);
    hash += (hash <<   2) + (hash << 14);
    return hash ^ (hash >>> 16)
  }
}

这将是一个给定的使用中保持一致,但不是一致的,从使用到使用,因此攻击者不能构造输入,迫使它要DoSsed。顺便说一句, NameTable 不使用的IEqualityComparer&LT; T&GT; ,因为它要处理的字符阵列与指数和长度不构建一个字符串,除非必要,但它确实做类似的事情。

This will be consistent within a given use, but not consistent from use to use, so an attacker can't construct input to force it to be DoSsed. Incidentally, NameTable doesn't use an IEqualityComparer<T> because it wants to deal with char-arrays with indices and lengths without constructing a string unless necessary, but it does do something similar.

顺便说一下,在Java中的散列$ C $下字符串的规定,并不会改变,但是这可能不是其他类的情况。

Incidentally, in Java the hash-code for string is specified and won't change, but this may not be the case for other classes.

编辑:已经做了一些研究成以上,我不再感到高兴有这样的算法在我的答案 ConsistentGuaranteedComparer 所采取的做法的整体素质;而它用来描述概念,它不具有不如人们可能喜欢的分布。当然,如果一个人已经实现了这样的事情,那么一个人不能没有打破的保证改变它,但如果我现在会建议使用我的这个图书馆,写说,经过调研如下:

Having done some research into the overall quality of the approach taken in ConsistentGuaranteedComparer above, I'm no longer happy with having such algorithms in my answers; while it serves to describe the concept, it doesn't have as good a distribution as one might like. Of course, if one has already implemented such a thing, then one can't change it without breaking the guarantee, but if I'd now recommend using this library of mine, written after said research as follows:

public class ConsistentGuaranteedComparer : IEqualityComparer<string>
{
  public bool Equals(string x, string y)
  {
    return x == y;
  }
  public int GetHashCode(string obj)
  {
    return obj.SpookyHash32();
  }
}

对于 RandomComparer 上面没有那么糟糕,但也可以改进:

That for RandomComparer above isn't as bad, but can also be improved:

public class RandomComparer : IEqualityComparer<string>
{
  private int hashSeed = Environment.TickCount;
  public bool Equals(string x, string y)
  {
    return x == y;
  }
  public int GetHashCode(string obj)
  {
    return obj.SpookyHash32(hashSeed);
  }
}

或者用于更难predictability:

Or for even harder predictability:

public class RandomComparer : IEqualityComparer<string>
{
  private long seed0 = Environment.TickCount;
  private long seed1 = DateTime.Now.Ticks;
  public bool Equals(string x, string y)
  {
    return x == y;
  }
  public int GetHashCode(string obj)
  {
    return obj.SpookyHash128(seed0, seed1).GetHashCode();
  }
}

这篇关于可以object.GetHash code()产生不同的结果在不同的机器相同的对象(字符串)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆