什么是哈希码计算的合理素数? [英] What is a sensible prime for hashcode calculation?

查看:131
本文介绍了什么是哈希码计算的合理素数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Eclipse 3.5有一个非常好的功能来生成Java hashCode()函数。它会生成例如(略微缩短:):

Eclipse 3.5 has a very nice feature to generate Java hashCode() functions. It would generate for example (slightly shortened:)

class HashTest {
    int i;
    int j;        
    public int hashCode() {
        final int prime = 31;
        int result = prime + i;
        result = prime * result + j;
        return result;
    }
}

(如果课程中有更多属性, result = prime * result + attribute.hashCode(); 对每个附加属性重复。对于ints .hashCode()可以省略。)

(If you have more attributes in the class, result = prime * result + attribute.hashCode(); is repeated for each additional attribute. For ints .hashCode() can be omitted.)

这似乎很好,但是对于素数的选择31。它可能来自 Java的hashCode实现字符串,用于性能原因,在引入硬件乘法器之后很久就消失了。对于i和j的小值,这里有许多哈希码冲突:例如(0,0)和(-1,31)具有相同的值。我认为这是一件坏事(TM),因为经常会出现小值。对于String.hashCode,您还会发现许多具有相同哈希码的短字符串,例如Ca和DB。如果你选择了一个大素数,如果你选择了素数,这个问题就会消失。

This seems fine but for the choice 31 for the prime. It is probably taken from the hashCode implementation of Java String, which was used for performance reasons that are long gone after the introduction of hardware multipliers. Here you have many hashcode collisions for small values of i and j: for example (0,0) and (-1,31) have the same value. I think that is a Bad Thing(TM), since small values occur often. For String.hashCode you'll also find many short strings with the same hashcode, for instance "Ca" and "DB". If you take a large prime, this problem disappears if you choose the prime right.

所以我的问题是:选择什么是好的素数?你用什么标准来找到它?

So my question: what is a good prime to choose? What criteria do you apply to find it?

这是一个普遍的问题 - 所以我不想给i和j一个范围。但我认为在大多数应用中,相对较小的值比较大的值更常出现。 (如果你有大的值,素数的选择可能不重要。)它可能没有多大区别,但更好的选择是一种简单明了的方法来改善这一点 - 那么为什么不这样做呢? Commons lang HashCodeBuilder 还建议奇怪的小值。

This is meant as a general question - so I do not want to give a range for i and j. But I suppose in most applications relatively small values occur more often than large values. (If you have large values the choice of the prime is probably unimportant.) It might not make much of a difference, but a better choice is an easy and obvious way to improve this - so why not do it? Commons lang HashCodeBuilder also suggests curiously small values.

澄清:这是为什么String中的Java的hashCode()使用31作为乘数?因为我的问题并不关心JDK中31的历史,而是关于使用相同基本模板的新代码中更好的价值。那里没有任何答案可以回答这个问题。)

(Clarification: this is not a duplicate of Why does Java's hashCode() in String use 31 as a multiplier? since my question is not concerned with the history of the 31 in the JDK, but on what would be a better value in new code using the same basic template. None of the answers there try to answer that.)

推荐答案

我建议使用 92821 。这就是原因。

I recommend using 92821. Here's why.

要给出一个有意义的答案,你必须知道 i 的可能值。 Ĵ。我唯一能想到的是,在很多情况下,小值比大值更常见。 (在你的程序中出现的值为15的几率要比438281923好得多。)因此,通过选择合适的素数来使最小的哈希码冲突尽可能大,这似乎是个好主意。对于31这个相当糟糕 - 已经为 i = -1 j = 31 你的哈希值与 i = 0 j = 0

To give a meaningful answer to this you have to know something about the possible values of i and j. The only thing I can think of in general is, that in many cases small values will be more common than large values. (The odds of 15 appearing as a value in your program are much better than, say, 438281923.) So it seems a good idea to make the smallest hashcode collision as large as possible by choosing an appropriate prime. For 31 this rather bad - already for i=-1 and j=31 you have the same hash value as for i=0 and j=0.

自这很有意思,我写了一个小程序,在这个意义上搜索整个int范围内的最佳素数。也就是说,对于每个素数,我在 i的所有值上搜索 Math.abs(i)+ Math.abs(j)的最小值,j 0,0 具有相同的哈希码,然后取最小值尽可能大的素数。

Since this is interesting, I've written a little program that searched the whole int range for the best prime in this sense. That is, for each prime I searched for the minimum value of Math.abs(i) + Math.abs(j) over all values of i,j that have the same hashcode as 0,0, and then took the prime where this minimum value is as large as possible.

Drumroll :在这个意义上最好的素数是486187739(最小的碰撞是 i = -25486,j = 67194 )。几乎同样好,更容易记住的是92821,最小的碰撞是 i = -46272和j = 46016

Drumroll: the best prime in this sense is 486187739 (with the smallest collision being i=-25486, j=67194). Nearly as good and much easier to remember is 92821 with the smallest collision being i=-46272 and j=46016.

如果你给小另一个含义,并希望最小的 Math.sqrt(i * i + j * j)尽可能大的碰撞,结果有点不同:最好的是1322837333, i = -6815和j = 70091 ,但我最喜欢的92821(最小碰撞 -46272 ,46016 )几乎和最佳值一样好。

If you give "small" another meaning and want to be the minimum of Math.sqrt(i*i+j*j) for the collision as large as possible, the results are a little different: the best would be 1322837333 with i=-6815 and j=70091, but my favourite 92821 (smallest collision -46272,46016) is again almost as good as the best value.

我确实认为这些计算是否有意义是值得商榷的实践。但我确实认为将92821作为素数比31更有意义,除非你有充分的理由不这样做。

I do acknowledge that it is quite debatable whether these calculation make much sense in practice. But I do think that taking 92821 as prime makes much more sense than 31, unless you have good reasons not to.

这篇关于什么是哈希码计算的合理素数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆