如何从 C# 中的字节数组生成哈希码? [英] How do I generate a hashcode from a byte array in C#?

查看:53
本文介绍了如何从 C# 中的字节数组生成哈希码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个存储字节数组的对象,我希望能够有效地为其生成哈希码.我过去曾为此使用过加密哈希函数,因为它们易于实现,但它们所做的工作比单向加密要多得多,我不在乎(我只是使用哈希码作为哈希表的键).

Say I have an object that stores a byte array and I want to be able to efficiently generate a hashcode for it. I've used the cryptographic hash functions for this in the past because they are easy to implement, but they are doing a lot more work than they should to be cryptographically oneway, and I don't care about that (I'm just using the hashcode as a key into a hashtable).

这是我今天所拥有的:

struct SomeData : IEquatable<SomeData>
{
    private readonly byte[] data;
    public SomeData(byte[] data)
    {
        if (null == data || data.Length <= 0)
        {
            throw new ArgumentException("data");
        }
        this.data = new byte[data.Length];
        Array.Copy(data, this.data, data.Length);
    }

    public override bool Equals(object obj)
    {
        return obj is SomeData && Equals((SomeData)obj);
    }

    public bool Equals(SomeData other)
    {
        if (other.data.Length != data.Length)
        {
            return false;
        }
        for (int i = 0; i < data.Length; ++i)
        {
            if (data[i] != other.data[i])
            {
                return false;
            }
        }
        return true;
    }
    public override int GetHashCode()
    {
        return BitConverter.ToInt32(new MD5CryptoServiceProvider().ComputeHash(data), 0);
    }
}

有什么想法吗?

dp:你说得对,我错过了 Equals 的检查,我已经更新了.使用字节数组中的现有哈希码将导致引用相等(或至少将相同的概念转换为哈希码).例如:

dp: You are right that I missed a check in Equals, I have updated it. Using the existing hashcode from the byte array will result in reference equality (or at least that same concept translated to hashcodes). for example:

byte[] b1 = new byte[] { 1 };
byte[] b2 = new byte[] { 1 };
int h1 = b1.GetHashCode();
int h2 = b2.GetHashCode();

使用该代码,尽管两个字节数组在其中具有相同的值,但它们指的是内存的不同部分,并会导致(可能)不同的哈希码.我需要具有相同内容的两个字节数组的哈希码相等.

With that code, despite the two byte arrays having the same values within them, they are referring to different parts of memory and will result in (probably) different hash codes. I need the hash codes for two byte arrays with the same contents to be equal.

推荐答案

对象的哈希码不需要唯一.

The hash code of an object does not need to be unique.

检查规则是:

  • 哈希码是否相等?然后调用完整的(慢)Equals 方法.
  • 哈希码不相等吗?那么这两个项目肯定不相等.

您只需要一个 GetHashCode 算法,该算法将您的集合分成大致均匀的组 - 它不应该像 HashTableDictionary<; 那样形成键.> 将需要使用哈希来优化检索.

All you want is a GetHashCode algorithm that splits up your collection into roughly even groups - it shouldn't form the key as the HashTable or Dictionary<> will need to use the hash to optimise retrieval.

您预计数据会持续多久?有多随机?如果长度变化很大(比如文件),那么只需返回长度.如果长度可能相似,请查看变化的字节子集.

How long do you expect the data to be? How random? If lengths vary greatly (say for files) then just return the length. If lengths are likely to be similar look at a subset of the bytes that varies.

GetHashCode 应该比 Equals 快很多,但不需要唯一.

GetHashCode should be a lot quicker than Equals, but doesn't need to be unique.

两个相同的事物绝不能具有不同的哈希码.两个不同的对象不应该具有相同的哈希码,但是可以预料到一些冲突(毕竟,排列比可能的 32 位整数更多).

Two identical things must never have different hash codes. Two different objects should not have the same hash code, but some collisions are to be expected (after all, there are more permutations than possible 32 bit integers).

这篇关于如何从 C# 中的字节数组生成哈希码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆