在数据库中创建一个散列$ C $下使用(即不使用GetHash code) [英] Creating a hashcode for use in a database (ie not using GetHashCode)

查看:128
本文介绍了在数据库中创建一个散列$ C $下使用(即不使用GetHash code)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近奉命在GetHash code()的方法,特别是的GetHash code,消费者不能依靠它是稳定的时间或跨应用程序域(从一个<一个href="http://blogs.msdn.com/b/ericlippert/archive/2011/02/28/guidelines-and-rules-for-gethash$c$c.aspx"相对=nofollow>埃里克利珀博客文章)。

I have recently been instructed in the ways of GetHashCode() and in particular "Consumers of GetHashCode cannot rely upon it being stable over time or across appdomains" (From an Eric Lippert blog article).

Unfortuantely我一直在用这个数据库中,以尝试加快查找(通过插入GetHash code中的结果,而不是做搜索的文本字符串)。我现在知道这是一个非常糟糕的事情。

Unfortuantely I have been using this in a database to try to speed up lookups (by inserting the result of GetHashCode rather than doing searches on text strings). I am now aware that this is a very bad thing to do.

所以,我在想什么就有什么,我可以做代替。 是否有特定的字符串将保证返回一个明智的抗碰撞整数,我可以用查找什么?

So I'm left wondering what there is that I can do instead. Is there anything that given a string will be guaranteed to return a sensibly collision resistant integer that I can use for lookups?

我可以写自己的东西,但我希望会有一些内置的,我可以使用,而不必去的东西在其中感觉有点重量级的加密程序库。

I could write something myself but I was hoping that there would be something built in that I could use without having to go for stuff in the cryptographic libraries which feels a bit heavyweight.

推荐答案

我会鼓励你考虑一下其他人所说:让数据库做什么,这是很好的。为了优化查找创建一个散列code是一个迹象表明,你的表中的索引不应该的。

I would encourage you to consider what the others have said: let the database do what it's good at. Creating a hash code in order to optimize lookups is an indication that the indexes on your table aren't what they should be.

这是说,如果你的真正的需要散列code:

That said, if you really need a hash code:

你不说,如果你想要一个32位或64位的散列code。这其中将创建一个64位的散列code的字符串。这是合理的耐碰撞。

You don't say if you want a 32-bit or 64-bit hash code. This one will create a 64-bit hash code for a string. It's reasonably collision-resistant.

public static long ComputeHashCode(string url)
{
    const ulong p = 1099511628211;

    ulong hash = 14695981039346656037;

    for (int i = 0; i < url.Length; ++i)
    {
        hash = (hash ^ url[i]) * p;
    }

    // Wang64 bit mixer
    hash = (~hash) + (hash << 21);
    hash = hash ^ (hash >> 24);
    hash = (hash + (hash << 3)) + (hash << 8);
    hash = hash ^ (hash >> 14);
    hash = (hash + (hash << 2)) + (hash << 4);
    hash = hash ^ (hash >> 28);
    hash = hash + (hash << 31);

    if (hash == (ulong)UNKNOWN_RECORD_HASH)
    {
        ++hash;
    }
    return (long)hash;
}

请注意,这是一个散列code和的可能的碰撞是pretty的小,如果你有高达数十亿条记录。经验法则:你有碰撞的50%的几率在项目的数量超过了你的散列code的范围内的平方根。该散列code的范围为2 ^ 64,所以如果你有2 ^ 32个项目,你碰撞的几率约为50%。

Note that this is a hash code and the likelihood of a collision is pretty small if you have up to a few billion records. Rule of thumb: you have a 50% chance of collision when the number of items exceeds the square root of your hash code's range. This hash code has a range of 2^64, so if you have 2^32 items, your chance of a collision is about 50%.

请参阅 http://www.informit.com/guides /content.aspx?g=dotnet&seqNum=792 的http:// EN。 wikipedia.org/wiki/Birthday_paradox#Probability_table 了解更多信息。

See http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=792 and http://en.wikipedia.org/wiki/Birthday_paradox#Probability_table for more information.

这篇关于在数据库中创建一个散列$ C $下使用(即不使用GetHash code)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆