如何生成在C#中的字节数组的哈希code? [英] How do I generate a hashcode from a byte array in C#?

查看:209
本文介绍了如何生成在C#中的字节数组的哈希code?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个存储字节数组的对象,我希望能够有效地生成一个散列code吧。我已经使用了加密散列函数为这个在过去,因为它们很容易实现,但他们正在做的比他们应该是单向加密很多的工作,我不关心这个(我只是用哈希code作为重点成哈希表)。

Say I have an object that stores a byte array and I want to be able to efficiently generate a hashcode for it. I've used the cryptographic hash functions for this in the past because they are easy to implement, but they are doing a lot more work than they should to be cryptographically oneway, and I don't care about that (I'm just using the hashcode as a key into a hashtable).

这就是我今天:

struct SomeData : IEquatable<SomeData>
{
    private readonly byte[] data;
    public SomeData(byte[] data)
    {
        if (null == data || data.Length <= 0)
        {
            throw new ArgumentException("data");
        }
        this.data = new byte[data.Length];
        Array.Copy(data, this.data, data.Length);
    }

    public override bool Equals(object obj)
    {
        return obj is SomeData && Equals((SomeData)obj);
    }

    public bool Equals(SomeData other)
    {
        if (other.data.Length != data.Length)
        {
            return false;
        }
        for (int i = 0; i < data.Length; ++i)
        {
            if (data[i] != other.data[i])
            {
                return false;
            }
        }
        return true;
    }
    public override int GetHashCode()
    {
        return BitConverter.ToInt32(new MD5CryptoServiceProvider().ComputeHash(data), 0);
    }
}

有什么想法?

DP:你说得对,我错过了在等于支票,我已经更新了。从字节数组利用现有的哈希code将导致引用相等(或至少是同一个概念转换成散列codeS)。
例如:

dp: You are right that I missed a check in Equals, I have updated it. Using the existing hashcode from the byte array will result in reference equality (or at least that same concept translated to hashcodes). for example:

byte[] b1 = new byte[] { 1 };
byte[] b2 = new byte[] { 1 };
int h1 = b1.GetHashCode();
int h2 = b2.GetHashCode();

使用该code,尽管具有在其中具有相同的值的两个字节数组,它们指的是存储器的不同部分,并会导致在(可能)不同的散列codeS。我需要的哈希codeS具有相同内容的两个字节数组是相等的。

With that code, despite the two byte arrays having the same values within them, they are referring to different parts of memory and will result in (probably) different hash codes. I need the hash codes for two byte arrays with the same contents to be equal.

推荐答案

对象的哈希code不应该是唯一的。

The hash code of an object shouldn't be unique.

该检查的规则是:


  • 是哈希codeS平等的吗?然后调用全(慢)等于方法。

  • 是哈希codeS不等于?那么这两个项目是绝对不相等的。

所有你想要的是你的收藏分裂成大致相抵组一个 GetHash code 算法 - 它不应该构成键作为的HashTable 词典&LT;方式&gt; 将需要使用哈希来优化检索

All you want is a GetHashCode algorithm that splits up your collection into roughly even groups - it shouldn't form the key as the HashTable or Dictionary<> will need to use the hash to optimise retrieval.

多久你期望的数据呢?如何随机的?如果长度差别很大(说的文件),然后只返回长度。如果长度很可能是在该改变的字节的一个子集相似的外观

How long do you expect the data to be? How random? If lengths vary greatly (say for files) then just return the length. If lengths are likely to be similar look at a subset of the bytes that varies.

GetHash code 应该比要快得多等于,但并不需要是唯一

GetHashCode should be a lot quicker than Equals, but doesn't need to be unique.

两个相同的事情绝不能的有不同的hash codeS。两个不同的对象的不应的具有相同的哈希code,但有些冲突是可以预期的(毕竟,有可能比32位整数更多排列)。

Two identical things must never have different hash codes. Two different objects should not have the same hash code, but some collisions are to be expected (after all, there are more permutations than possible 32 bit integers).

这篇关于如何生成在C#中的字节数组的哈希code?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆