在列表上创建哈希值? [英] Create Hash Value on a List?

查看:224
本文介绍了在列表上创建哈希值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 List< MyRichObject> ,其中有50个实例。每个实例都有1或2个独特的属性,但它们都是唯一的,因为在列表中只有一个位置等。



我想想出一个独特的方式来散列这个列表,所以它是所有其他列表中唯一的。有没有一种明智的方式来在.NET 4中做到这一点?



目的是为列表创建一种monniker,这样它们可以被转储到队列中并在稍后根据其独特价值发现。



感谢。

TL; DR



  public static int GetSequenceHashCode< T>(this IList< T>序列)
{
const int seed = 487;
const int modifier = 31;

unchecked
{
return sequence.Aggregate(seed,(current,item)=>
(current * modifier)+ item.GetHashCode());


$ / code $ / pre

为什么还要用另一个答案呢?



如果列表中有多个项目,接受的答案可能会导致危险的不准确结果使用相同的哈希码。例如,考虑这些输入:

  var a = new [] {foo}; 
var b = new [] {foo,bar};
var c = new [] {foo,bar,spam};
var d = new [] {seenoevil,hearnoevil,speaknoevil};

这些都会产生不同的结果,表明它们都是独特的集合。大!现在让我们试着重复:

  var e = new [] {foo,bar,spam} ; 

GetSequenceHashCode 应该产生相同的结果 c e - 它确实如此。到现在为止还挺好。现在,让我们尝试一些无序的项目:

  var f = new [] {spam,bar,foo }; 

呃哦... GetSequenceHashCode 表示 f 等于 c e ,它是不。这是为什么发生?以 c 为例,将其分解为实际的哈希码值:

  int hashC =foo.GetHashCode()^ 
bar.GetHashCode()^
spam.GetHashCode();

由于这里的确切数字并不重要,为了更清晰地演示,我们假装hash这三个字符串的代码是 foo = 8 bar = 16 spam = 32 。所以:

  int hashC = 8 ^ 16 ^ 32; 

或将其分解为二进制表示形式:

  8 ^ 16 ^ 32 == 56; 

// 8 = 00001000
// ^
// 16 = 00010000
// ^
// 32 = 00100000
/ / =
// 56 00111000

现在您应该明白为什么列表被这个实现忽略,即 8 ^ 16 ^ 32 = 16 ^ 8 ^ 32 = 32 ^ 16 ^ 8 等等。

其次有重复的问题。即使你认为在不同的序列中拥有相同的内容是可以的(这不是我鼓励的方法),但我认为任何人都不会认为下面的行为是可取的。让我们尝试每个列表中重复的变体。

  var a = new [] {foo,bar,spam }; 
var b = new [] {foo,bar,spam,foo};
var c = new [] {foo,bar,spam,foo,foo};
var d = new [] {foo,bar,spam,foo,foo,spam,foo,spam,foo};

虽然 a b 生成不同的序列哈希, GetSequenceHashCode 表明 a c d 都是一样的。为什么?



如果您将自己的数字与您自己进行了异或,您基本上将其取消,即

  8 ^ 8 == 0; 

// 8 = 00001000
// ^
// 8 = 00001000
// =
// 0 = 00000000

通过同一个数字XOR再次给出原始结果,即

  8 ^ 8 ^ 8 == 8; 

// 8 = 00001000
// ^
// 8 = 00001000
// ^
// 8 = 00001000
/ / =
// 8 = 00001000

所以如果我们看 a c ,替换简化的哈希码:

  var a = new [] {8,16,32}​​; 
var c = new [] {8,16,32,8,8};

哈希码包含为:

  int hashA = 8 ^ 16 ^ 32; // = 56 
int hashC = 8 ^ 16 ^ 32 ^ 8 ^ 8; // = 56
//↑↑
//这两个取消彼此

以及 d ,其中每对 foo 垃圾邮件将自行取消。


I have a List<MyRichObject> with 50 instances in it. Each of the instances has 1 or 2 unique properties, but in a way they are all unique because there is only one at position in the list, etc.

I would like to come up with a unique way to "hash" this List so it is unique from all of the other Lists. Is there a smart way to do that in .NET 4?

The purpose is to create a kind of "monniker" for the Lists so they can be dumped into a queue and found later based on their unique value.

Thanks.

解决方案

TL;DR

public static int GetSequenceHashCode<T>(this IList<T> sequence)
{
    const int seed = 487;
    const int modifier = 31;

    unchecked
    {
        return sequence.Aggregate(seed, (current, item) =>
            (current*modifier) + item.GetHashCode());
    }            
}

Why bother with another answer?

The accepted answer can give dangerously inaccurate results if you have multiple items in the list with the same hash code. For example consider these inputs:

var a = new []{ "foo" };
var b = new []{ "foo", "bar" };
var c = new []{ "foo", "bar", "spam" };
var d = new []{ "seenoevil", "hearnoevil", "speaknoevil" };

These all produce different results suggesting they are all unique collections. Great! Now let's try with a duplicate:

var e = new []{ "foo", "bar", "spam" };

GetSequenceHashCode should produce the same result for both c and e - and it does. So far so good. Now let's try with items out of sequence:

var f = new []{ "spam", "bar", "foo" };

Uh oh... GetSequenceHashCode indicates that f is equal to both c and e which it is not. Why is this happening? Break it down into the actual hash code values first, using c as an example:

int hashC = "foo".GetHashCode() ^ 
            "bar".GetHashCode() ^ 
            "spam".GetHashCode();

Since the exact numbers here aren't really important and for the sake of clearer demonstration let's pretend the hash codes of the three strings are foo=8, bar=16 and spam=32. So:

int hashC = 8 ^ 16 ^ 32;

or to break it down into binary representation:

8 ^ 16 ^ 32 == 56;

//  8 = 00001000
//  ^
// 16 = 00010000
//  ^
// 32 = 00100000
//  =
// 56   00111000

Now you should see why the order of items in the list is overlooked by this implementation, i.e. 8^16^32 = 16^8^32 = 32^16^8 etc.

Secondly there's an issue with duplicates. Even if you assume that having the same contents in a different sequence is OK (which is not an approach I would encourage), I don't think anyone will argue the below behaviour is desirable. Let's try variations with duplicates within each list.

var a = new []{ "foo", "bar", "spam" };
var b = new []{ "foo", "bar", "spam", "foo" };
var c = new []{ "foo", "bar", "spam", "foo", "foo" };
var d = new []{ "foo", "bar", "spam", "foo", "foo", "spam", "foo", "spam", "foo" };

While a and b generate different seqeuence hashes, GetSequenceHashCode suggests that a, c and d are all the same. Why?

If you XOR a number with itself you essentially cancel it out, i.e.

8 ^ 8 == 0;

//  8 = 00001000
//  ^
//  8 = 00001000
//  =
//  0 = 00000000

XOR by the same number again gives you the original result, i.e.

8 ^ 8 ^ 8 == 8;

//  8 = 00001000
//  ^
//  8 = 00001000
//  ^
//  8 = 00001000
//  =
//  8 = 00001000

So if we look at a and c again, substituting the simplified hash codes:

var a = new []{ 8, 16, 32 };
var c = new []{ 8, 16, 32, 8, 8 };

the hash codes are caclulated as:

int hashA = 8 ^ 16 ^ 32;         // = 56
int hashC = 8 ^ 16 ^ 32 ^ 8 ^ 8; // = 56
                       // ↑   ↑ 
                       // these two cancel each other out

and likewise with d where each pair of foo and spam cancels itself out.

这篇关于在列表上创建哈希值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆