什么是 Java 中用于文本字符串的良好 64 位哈希函数? [英] What is a good 64bit hash function in Java for textual strings?

查看:19
本文介绍了什么是 Java 中用于文本字符串的良好 64 位哈希函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个哈希函数:

I'm looking for a hash function that:

  1. 散列文本字符串(例如很少冲突)
  2. 用 Java 编写,并被广泛使用
  3. 奖励:适用于多个字段(而不是我将它们连接起来并对连接的字符串应用哈希)
  4. 优点:有一个 128 位变体.
  5. 优点:不占用 CPU.

推荐答案

为什么不使用默认 String.hashCode()long 变体(其中一些非常聪明的人肯定会努力使其高效 - 更不用说已经看过此代码的数千名开发人员的眼睛)?

Why don't you use a long variant of the default String.hashCode() (where some really smart guys certainly put effort into making it efficient - not mentioning the thousands of developer eyes that already looked at this code)?

// adapted from String.hashCode()
public static long hash(String string) {
  long h = 1125899906842597L; // prime
  int len = string.length();

  for (int i = 0; i < len; i++) {
    h = 31*h + string.charAt(i);
  }
  return h;
}

如果您正在寻找更多位,您可以使用 BigInteger

正如我在对@brianegge 的回答的评论中提到的,超过 32 位的散列没有太多用例,而且很可能没有一个用于超过 64 位的散列:

As I mentioned in a comment to the answer of @brianegge, there are not much usecases for hashes with more than 32 bits and most likely not a single one for hashes with more than 64 bits:

我可以想象一个巨大的哈希表分布在数十台服务器上,可能存储数百亿个映射.对于这种情况,@brianegge 在这里仍然有一个有效的观点:32 位允许 2^32(约 43 亿)个不同的哈希键.假设一个强大的算法,你应该仍然有相当多的碰撞.使用 64 位(18,446,744,073 亿个不同的密钥)当然可以节省您的费用,无论您需要什么疯狂的场景.不过,考虑 128 位密钥(340,282,366,920,938,463,463,374,607,4310 亿个可能的密钥)的用例几乎是不可能的.

I could imagine a huge hashtable distributed across dozens of servers, maybe storing tens of billions of mappings. For such a scenario, @brianegge still has a valid point here: 32 bit allow for 2^32 (ca. 4.3 billion) different hash keys. Assuming a strong algorithm, you should still have quite few collisions. With 64 bit (18,446,744,073 billion different keys) your certainly save, regardless of whatever crazy scenario you need it for. Thinking of usecases for 128 bit keys (340,282,366,920,938,463,463,374,607,431 billion possible keys) is pretty much impossible though.

要组合多个字段的散列,只需做一个XOR将一个与一个素数相乘并相加:

To combine the hash for several fields, simply do an XOR multiply one with a prime and add them:

long hash = MyHash.hash(string1) * 31 + MyHash.hash(string2);

小素数在那里是为了避免切换值的哈希码相等,即 {'foo','bar'} 和 {'bar','foo'} 不相等,应该有不同的哈希码.XOR 很糟糕,因为如果两个值相等,它会返回 0.因此,{'foo','foo'} 和 {'bar','bar'} 将具有相同的哈希码.

The small prime is in there to avoid equal hash code for switched values, i.e. {'foo','bar'} and {'bar','foo'} aren't equal and should have a different hash code. XOR is bad as it returns 0 if both values are equal. Therefore, {'foo','foo'} and {'bar','bar'} would have the same hash code.

这篇关于什么是 Java 中用于文本字符串的良好 64 位哈希函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆