了解奇怪的Java哈希函数 [英] Understanding strange Java hash function

查看:105
本文介绍了了解奇怪的Java哈希函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是 java.util.HashMap 中散列函数的源代码。评论充分说明了它的成就。 但是如何 ^ >>> 运算符在做什么? 有人可以解释代码实际上是如何做的

<$ p $
* / **
*对给定的hashCode应用一个补充散列函数,
*防止质量差的散列函数。这是至关重要的
*,因为HashMap使用两个长度的哈希表,所以
*会碰到hashCodes的冲突,而这些hashCodes在低位中不会有
*的差异。注意:空键总是映射到散列0,因此索引为0.
* /
static int hash(int h){
//此函数确保只有
//每个比特位置的常数倍数有一个有界的
//碰撞次数(默认加载因子大约为8)。
$ b h ^ =(h>> 20)^(h>> 12);
return h ^(h>> 7)^(h>> 4);
}


解决方案

这里是一些代码和示例输出:

  public static void main(String [] args){
int h = 0xffffffff;
int h1 = h>>> 20;
int h2 = h>>> 12;
int h3 = h1 ^ h2;
int h4 = h ^ h3;
int h5 = h4>>> 7;
int h6 = h4>>> 4;
int h7 = h5 ^ h6;
int h8 = h4 ^ h7;

printBin(h);
printBin(h1);
printBin(h2);
printBin(h3);
printBin(h4);
printBin(h5);
printBin(h6);
printBin(h7);
printBin(h8);


$ b static void printBin(int h){
System.out.println(String.format(%32s,
Integer。 toBinaryString(h)).replace('','0'));

$ / code>

打印:

  11111111111111111111111111111111 
00000000000000000000111111111111
00000000000011111111111111111111
000000000000111111111111000000000000
11111111111100000000111111111111
00000001111111111110000000011111
00001111111111110000000011111111
00001110000000001110000011100000
11110001111100001110111100011111

因此,代码将散列函数分解成多个步骤,以便您可以看到发生了什么。 20个位置的第一个移位与12个位置的第二个移位创建一个掩码,可以翻转整数的0或更多的最低20位。所以你可以得到一些插入底部位的随机性,这些随机性使用潜在更好的分布式高位。然后通过xor将其应用到原始值以将该随机性添加到较低位。 7个位置的第二个移位或4个位置的移位创建了一个掩码,它可以翻转底部28位中的0个或更多个位,这通过利用先前的异或函数再次给较低位和一些更重要的位置带来一些随机性已经解决了一些低位分布问题。最终的结果是通过散列值的比特更平滑地分配。



由于java中的hashmap通过将散列值与需要的桶数组合来计算桶索引有均匀分布的散列值的低位以平均分散到每个桶中。



至于证明这个限制碰撞次数的陈述,那一个我没有任何意见。此外,请参阅此处,获取关于构建哈希函数的一些有用信息以及关于为何xor of two数字往往会随机分布在结果中。

Following is the source code for a hash function in java.util.HashMap. The comments explain well enough what it's accomplishing. but how? What are the ^ and >>> operators doing? Can someone explain how the code actually does what the comments say?

/**
 * Applies a supplemental hash function to a given hashCode, which
 * defends against poor quality hash functions.  This is critical
 * because HashMap uses power-of-two length hash tables, that
 * otherwise encounter collisions for hashCodes that do not differ
 * in lower bits. Note: Null keys always map to hash 0, thus index 0.
 */
static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).

    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

解决方案

Dunno' about English, but here is some code and the sample output:

public static void main ( String[] args ) {
    int h = 0xffffffff;
    int h1 = h >>> 20;
    int h2 = h >>> 12;
    int h3 = h1 ^ h2;
    int h4 = h ^ h3;
    int h5 = h4 >>> 7;
    int h6 = h4 >>> 4;
    int h7 = h5 ^ h6;
    int h8 = h4 ^ h7;

    printBin ( h );
    printBin ( h1 );
    printBin ( h2 );
    printBin ( h3 );
    printBin ( h4 );
    printBin ( h5 );
    printBin ( h6 );
    printBin ( h7 );
    printBin ( h8 );

}

static void printBin ( int h ) {
    System.out.println ( String.format ( "%32s", 
        Integer.toBinaryString ( h ) ).replace ( ' ', '0' ) );
}

Which prints:

11111111111111111111111111111111
00000000000000000000111111111111
00000000000011111111111111111111
00000000000011111111000000000000
11111111111100000000111111111111
00000001111111111110000000011111
00001111111111110000000011111111
00001110000000001110000011100000
11110001111100001110111100011111

So, the code breaks down the hash function into steps so that you can see what is happening. The first shift of 20 positions xor with the second shift of 12 positions creates a mask that can flip 0 or more of the bottom 20 bits of the int. So you can get some randomness inserted into the bottom bits that makes use of the potentially better distributed higher bits. This is then applied via xor to the original value to add that randomness to the lower bits. The second shift of 7 positions xor the shift of 4 positions creates a mask that can flip 0 or more of the bottom 28 bits, which brings some randomness again to the lower bits and to some of the more significant ones by capitalizing on the prior xor which already addressed some of the distribution at the lower bits. The end result is a smoother distribution of bits through the hash value.

Since the hashmap in java computes the bucket index by combining the hash with the number of buckets you need to have an even distribution of the lower bits of the hash value to spread the entries evenly into each bucket.

As to proving the statement that this bounds the number of collisions, that one I don't have any input on. Also, see here for some good information on building hash functions and a few details on why the xor of two numbers tends towards random distribution of bits in the result.

这篇关于了解奇怪的Java哈希函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆