理解奇怪的Java哈希函数 [英] Understanding strange Java hash function

查看:22
本文介绍了理解奇怪的Java哈希函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是java.util.HashMap 中散列函数的源代码.评论很好地解释了它正在完成的工作.但是怎么做? ^>>> 运算符在做什么?有人能解释一下代码实际上如何评论吗?

Following is the source code for a hash function in java.util.HashMap. The comments explain well enough what it's accomplishing. but how? What are the ^ and >>> operators doing? Can someone explain how the code actually does what the comments say?

/**
 * Applies a supplemental hash function to a given hashCode, which
 * defends against poor quality hash functions.  This is critical
 * because HashMap uses power-of-two length hash tables, that
 * otherwise encounter collisions for hashCodes that do not differ
 * in lower bits. Note: Null keys always map to hash 0, thus index 0.
 */
static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).

    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

推荐答案

以下是一些代码和示例输出:

Here is some code and the sample output:

public static void main ( String[] args ) {
    int h = 0xffffffff;
    int h1 = h >>> 20;
    int h2 = h >>> 12;
    int h3 = h1 ^ h2;
    int h4 = h ^ h3;
    int h5 = h4 >>> 7;
    int h6 = h4 >>> 4;
    int h7 = h5 ^ h6;
    int h8 = h4 ^ h7;

    printBin ( h );
    printBin ( h1 );
    printBin ( h2 );
    printBin ( h3 );
    printBin ( h4 );
    printBin ( h5 );
    printBin ( h6 );
    printBin ( h7 );
    printBin ( h8 );

}

static void printBin ( int h ) {
    System.out.println ( String.format ( "%32s", 
        Integer.toBinaryString ( h ) ).replace ( ' ', '0' ) );
}

打印:

11111111111111111111111111111111
00000000000000000000111111111111
00000000000011111111111111111111
00000000000011111111000000000000
11111111111100000000111111111111
00000001111111111110000000011111
00001111111111110000000011111111
00001110000000001110000011100000
11110001111100001110111100011111

因此,代码将哈希函数分解为多个步骤,以便您可以看到发生了什么.第一个移位 20 个位置 xor 与第二个移位 12 个位置创建一个掩码,可以翻转 int 的底部 20 位中的 0 个或更多位.因此,您可以将一些随机性插入到底部位中,从而利用可能更好地分布的较高位.然后通过异或将其应用于原始值以将该随机性添加到较低位.7 个位置的第二次移位 xor 4 个位置的移位创建了一个掩码,该掩码可以翻转底部 28 位中的 0 位或更多位,通过利用先前的 xor 再次为较低位和一些更重要的位带来一些随机性这已经解决了较低位的一些分布.最终结果是通过哈希值更平滑的位分布.

So, the code breaks down the hash function into steps so that you can see what is happening. The first shift of 20 positions xor with the second shift of 12 positions creates a mask that can flip 0 or more of the bottom 20 bits of the int. So you can get some randomness inserted into the bottom bits that makes use of the potentially better distributed higher bits. This is then applied via xor to the original value to add that randomness to the lower bits. The second shift of 7 positions xor the shift of 4 positions creates a mask that can flip 0 or more of the bottom 28 bits, which brings some randomness again to the lower bits and to some of the more significant ones by capitalizing on the prior xor which already addressed some of the distribution at the lower bits. The end result is a smoother distribution of bits through the hash value.

由于 java 中的 hashmap 是通过将哈希值与桶数相结合来计算桶索引的,因此您需要均匀分布哈希值的低位以将条目均匀地分布到每个桶中.

Since the hashmap in java computes the bucket index by combining the hash with the number of buckets you need to have an even distribution of the lower bits of the hash value to spread the entries evenly into each bucket.

至于证明这限制了碰撞次数的说法,我没有任何输入.另外,请参阅此处,了解有关构建哈希函数的一些好信息和一些细节关于为什么两个数的异或在结果中趋向于位的随机分布.

As to proving the statement that this bounds the number of collisions, that one I don't have any input on. Also, see here for some good information on building hash functions and a few details on why the xor of two numbers tends towards random distribution of bits in the result.

这篇关于理解奇怪的Java哈希函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆