澄清Java的HashSet / HashMap实现背后的事实 [英] clarifying facts behind Java's implementation of HashSet/HashMap

查看:92
本文介绍了澄清Java的HashSet / HashMap实现背后的事实的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

1。
我理解不同的哈希映射机制以及处理关键冲突的方式(开放寻址 - 线性/二次探测,链接,可扩展哈希等.HashSet / HashMap使用哪一个?



2。
我知道一个好的HashMap依赖于一个好的散列函数,Java的HashSet / HashMap如何散列这些对象?我知道有一个散列函数,但到目前为止,我还没有需要实现这个功能的字符串呢?如果我现在想要创建一个我创建的Java对象 - 我是否需要实现散列函数?或者Java是否具有创建散列代码的内置方式? / p>

我知道默认实现不能被依赖,因为它将哈希函数放在不是常量的内存地址上。

解决方案

您可以通过阅读 HashMap 的源代码。 (提示:您通常可以找到使用Google的Java SE类的源代码;例如,搜索 java.util.HashMap source


我理解不同的哈希映射机制以及处理关键冲突的方式(开放寻址 - 线性/二次探测,链接,可扩展哈希等。HashSet / HashMap使用哪一个?

链接。 (b版本中的第154行链接到的)。


Java的HashSet / HashMap如何散列对象?


它没有,对象的 hashcode 方法被调用来执行此操作。代码(第360行)。

如果你看看代码,你会看到一些有趣的皱纹:


  • 代码(在我链接的版本中)是使用特殊方法对字符串进行散列(看起来这是为了允许在平台级别对字符串进行散列调整。我没有深入...)

  • Object.hashcode()返回的散列码呼叫进一步加扰以减少冲突的可能性。 (阅读评论!)





如果我现在要散列Java对象我创建 - 我需要实现散列函数吗?


您可以这样做。



您是否需要这样做取决于您如何为等于定义类。特别是,Java的 HashMap HashSet 和相关的类在 hashcode() / code>和 equals(Object)


  1. 如果 a.equals(a)然后 a.hashcode()== b.hashcode()

  2. 虽然 a 位于 HashSet 中,或者是 HashMap中的键 a.hashcode()返回的值不得更改。 如果!a.equals(b),那么 a.hashcode()== b.hashcode()的概率应该很低,特别是如果 a b 可能是应用程序的哈希键。

(性能原因的最后一项要求,如果你有一个差散列函数,很可能导致不同的密钥散列相同的散列码,你会遇到很多冲突。散列链将变得不平衡,并且你不会得到平均值 O(1) performance tha t通常需要散列表操作。在最坏的情况下,性能将会是 O(N);即相当于一个链表的线性搜索。)


或者Java有创建哈希代码的内置方式吗?继承一个默认的 hashcode()方法,对象(除非被覆盖)。它使用所谓的身份哈希码。即基于对象的身份(其引用)的散列值。它匹配 equals(Object) ...的默认实现,它只是简单地使用 == 来比较引用。 p>


我知道默认实现不能被依赖,因为它将哈希函数放在不是常量的内存地址上。 b


这是不正确的。



默认 hashcode() code>方法返回身份哈希码。这通常基于在某个时间点的对象的内存地址,但它不是对象的内存地址。



特别是,如果对象被垃圾收集器移动,其身份哈希码保证不会改变。是。这是正确的,它不会改变......即使对象被移动了!



(它们如何有效地实现这个过程相当聪明,参见https://stackoverflow.com/a/3796963/139985 )。



底线是默认的 Object.hashcode()方法满足上面列出的所有要求。它可以被依赖。


1. I understand the different hash map mechanisms and the ways in which key collisions are handled (either open addressing -linear/quadratic probing, chaining, extendable hashing, etc. Which one does HashSet/HashMap make use of?

2. I realise that a good HashMap relies on a good hash function. How does Java's HashSet/HashMap hash the objects? I know that there is a hash function but so far for strings I have not needed to implement this. What if I now want to Hash a Java Object that I create - do I need to implement the hash function? Or does Java have a built in way of creating a hash code?

I know that the default implementation cannot be relied on as it bases the hash function on the memory address which is not constant.

解决方案

You could answer many of these questions yourself, by reading the source code for HashMap.

(Hint: you can usually find the source code for Java SE classes using Google; e.g. search for "java.util.HashMap source".)

I understand the different hash map mechanisms and the ways in which key collisions are handled (either open addressing -linear/quadratic probing, chaining, extendable hashing, etc. Which one does HashSet/HashMap make use of?

Chaining. See the source code. (Line 154 in the version I linked to).

How does Java's HashSet/HashMap hash the objects?

It doesn't. The object's hashcode method is called to do this. See the source code. (line 360).

If you look at the code you will see some interesting wrinkles:

  • The code (in the version I linked to) is hashing Strings using a special method. (It appears that this is to allow hashing of strings to be "tuned" at the platform level. I didn't dig into this ...)

  • The hashcode returned by the Object.hashcode() call is "scrambled" further to reduce the chance of collisions. (Read the comment!)

What if I now want to Hash a Java Object that I create - do I need to implement the hash function?

You can do that.

Whether you need to do this depends on how you have defined equals for the class. Specifically, Java's HashMap, HashSet and related classes place the following requirement on hashcode() and equals(Object):

  1. If a.equals(a) then a.hashcode() == b.hashcode().
  2. While a is in a HashSet or is a key in a HashMap, the value returned by a.hashcode() must not change.
  3. if !a.equals(b), then the probability that a.hashcode() == b.hashcode() should be low, especially if a and b are probably hash keys for the application.

(The last requirement for performance reasons. If you you have a "poor" hash function that results in a high probability that different keys hash the same hashcode, you get lots of collisions. The hash chains will become unbalanced, and you won't get the average O(1) performance that is normally expected of hash table operations. In the worst case, performance will be O(N); i.e. equivalent to a linear search of a linked list.)

Or does Java have a built in way of creating a hash code?

Every class inherits a default hashcode() method from Object (unless this is overridden). It uses what is known as an "identity hash code"; i.e. a hash value that is based on the object's identity (its reference). This matches the default implementation of equals(Object) ... which simply uses == to compare references.

I know that the default implementation cannot be relied on as it bases the hash function on the memory address which is not constant.

This is incorrect.

The default hashcode() method returns the "identity hashcode". This is typically based on the object's memory address at some point time, but it is NOT the object's memory address.

In particular, if an object is moved by the garbage collector, its "identity hashcode" is guaranteed not to change. Yes. That's right, it DOES NOT CHANGE ... even though the object was moved!

(How they implement this efficiently is rather clever. See https://stackoverflow.com/a/3796963/139985 for details.)

The bottom line is that the default Object.hashcode() method satisfies all of the requirements that I listed above. It can be relied on.

这篇关于澄清Java的HashSet / HashMap实现背后的事实的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆