为什么要在 HashMap 中使用哈希方法 [英] Why hash method in HashMap

查看:29
本文介绍了为什么要在 HashMap 中使用哈希方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

hash 方法状态的Java doc,

Java doc of hash method states,

检索对象散列码并将补充散列函数应用于结果散列,以防止劣质散列函数.这很关键,因为 HashMap 使用长度为二的幂的哈希表,否则会遇到低位没有差异的 hashCode 的冲突.

Retrieve object hash code and applies a supplemental hash function to the result hash, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits.

我无法理解的是,

1) 为什么 HashMap 使用长度为二的幂的哈希表?

1) Why HashMap uses power-of-two length hash tables?

在声明表时也有说明:

/**
 * The table, resized as necessary. Length MUST Always be a power of two.
 */
transient Entry<K,V>[] table;

为什么会有这个限制?

2) 否则会遇到低位相同的 hashCode 冲突. 是什么意思?

推荐答案

hashmap 的目的是快速缩小在搜索特定键时需要查看的对象数量(理想情况下为 0 或 1).

The purpose of a hashmap is to very quickly narrow down how many objects you need to look at (ideally 0 or 1) when searching for a specific key.

HashMap.get(key)的一般方法如下:

  1. 调用 key.hashCode() 以获取表示对象的单个整数.

  1. Call key.hashCode() to get a single integer that represents the object.

查看基于该哈希码的哈希桶",其中可以包含零个或多个条目.

Look in a hash "bucket" based on that hashcode, which can contain zero or more entries.

遍历bucket中的每一个entry,看看是否有entry的key是.equals(key).如果是这样,请将其退回.如果存储桶中没有条目与搜索的条目具有相同的键,则返回 null.

Go through each entry in the bucket and find if any entry's key is .equals(key). If so, return it. If no entry in the bucket has an equal key to the one searched for, return null.

good hashmap 和 bad hashmap 的区别在于速度.您必须平衡所有这三个问题:

The difference between a good hashmap and a bad hashmap is speed. You have to balance all three of these concerns:

  1. 您可以多快将密钥转换为哈希码?

  1. How quickly can you transform the key into a hashcode?

两个不同的键映射到同一个哈希码的频率如何?

How often do two different keys map to the same hashcode?

您多久会将具有不同哈希码的两个密钥放入同一个桶"中?

How often will you put two keys with different hashcodes into the same "bucket"?

Java 的设计者选择了一组他们认为最平衡的折衷方案.没有正确的答案,但您必须选择一种特定的方法,并在文档中写入该方法是什么.

Java's designers have chosen a set of tradeoffs they think balances best. There is no right answer, but you do have to choose a specific approach, and write into the documentation what that approach is.

Java 的设计者可能有一些基于添加到哈希图中的典型数据的统计证据.

Java's designers likely have some statistic evidence based on typical data added to hashmaps.

他们选择通过提取哈希码的最低 n 位来将哈希码转换为桶,因为这些位的变化比高位更频繁.他们选择提取位而不是另一种将哈希码转换为桶的典型方法(除以素数后的整数余数),因为在 Java 最常部署到的平台上,这通常是一种更快的操作.

They chose to convert hashcode to bucket by extracting the lowest n bits of the hashcode, because those vary more often than the upper bits. They chose extracting bits over another typical method of converting hashcode to bucket (integer remainder after dividing by a prime number) because it's typically a faster operation on the platforms Java is most commonly deployed to.

Java 的设计者可能发现,第 1 步,hashCode() 的实现,是由 Java 用户编写的,而且通常很糟糕,为他们想要的许多对象返回相同的 hashCode存储在同一个哈希图中.想象一下,如果 hashCode 是这样的:

What Java's designers may have found is that step 1, the implementation of hashCode(), is written by Java users, and can often be terrible, returning the same hashCode for lots of objects they want to store in the same hashmap. Imagine if the hashCode was this:

public class MyInteger {
    final int i;
    public MyInteger(int i) {
        this.i = i;
    }
    public int hashCode() {
        return i << 24; // will return 0x00000000, 0x01000000, etc.
    }
    public boolean equals(Object o) {
        return (o != null) && (o instanceof MyInteger) && ((MyInteger)o).i == i;
    }
}

这就是他们所说的劣质";哈希码的低位变化不大.在这种病态的实现中,低 24 位根本没有变化!

This is what they call "poor quality"; the lower bits of the hashcode don't vary very much. In this pathological implementation, the lower 24 bits don't vary at all!

在这种情况下,对于任何小于 16,777,216 个桶的 hashmap,可以进入 hashmap 的每个键都将转到桶 0.其他 16,777,215 个桶将为空.

In this case, for hashmaps any smaller than 16,777,216 buckets, every single key that could go in the hashmap will go to bucket 0. The other 16,777,215 buckets will be empty.

其他人的哈希码可能没有这么糟糕,但它们已经够糟糕了,以至于 Java 的设计者选择添加第二个哈希码来帮助提高两个不同键进入两个不同存储桶的机会,从而减少对象的数量每次检索给定键时都需要检查是否相等.

Other people's hashcodes may not be as bad as this, but they're bad enough that Java's designers chose to add a second hashcode to help improve the chance that two different keys will go into two different buckets, thus reducing how many objects need to be checked for equality each time a given key is retrieved.

这篇关于为什么要在 HashMap 中使用哈希方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆