hashCode,实现和与HashMap的关系 [英] hashCode, implementation, and relation to HashMap

查看:127
本文介绍了hashCode,实现和与HashMap的关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我问了另一个相关的问题:带有雪崩效应的Java字符串散列函数,但我现在有一个不同的相关问题。



我在那个问题中建立的是String的hashCode()函数没有雪崩影响。这意味着,例如,如果我有字符串k1,k2,k3,并且每次调用hashCode(),则返回的值将是连续的。

现在,根据我对数据结构101的回忆,我觉得这是一件坏事。因为假设HashMap通过算法选择桶,如下所示:

  class HashMap {
private int capacity;
private int chooseBucket(String key){
return key.hashCode()%capacity;




$ b $ p
$ b

这意味着类似的键存储在连续的桶中,导致更高的碰撞率,从O(1)中将大O查找时间降级为...谁知道有多糟......可能比O(log n)差。

我得到的第一个问题的答案类型是'雪崩效应在这里不需要','它仅用于密码散列函数',并且'字符串的hashCode实现速度快,适用于小型哈希映射'。



这让我感到困惑。所有的数据结构都很小。 Sun不会提供默认的hashCode函数,它可以很好地适用于大型数据集?那是什么时候HashMap的性能真的很重要,不是吗?

或者,我错过了什么?请赐教。

解决方案

将密钥存储在连续桶中不会导致性能下降。将键存储在相同存储桶中(例如链接) 。当使用链接来解决散列冲突时:




  • 最坏情况:每个散列值都是相同的,因此所有元素都以相同的桶,在这种情况下,您可以获得O(n)性能(假设链是链表)最佳情况:每个散列值都不相同,因此每个元素都以不同的存储桶,因此您可以获得预期的O(1)性能。






用于散列表(等)中需要雪崩效果 a>。


So I asked another related question here: java string hash function with avalanche effect, but I have a different, related question now.

What I established in that question was that the hashCode() function for String does not have an avalanche effect. This means, for example, that if I have strings "k1", "k2", "k3", and I call hashCode() on each, the values returned will be contiguous.

Now, based on my recollection of data structures 101, I was under the impression that this is a bad thing. Because assuming that HashMap chooses buckets by an algorithm something like:

class HashMap {
    private int capacity;
    private int chooseBucket(String key) {
        return key.hashCode() % capacity;
    }
}

It would mean that similar keys are stored in contiguous buckets, leading to a higher rate of collisions, degrading big-O lookup time from O(1) to be...who knows how bad...maybe worse than O(log n).

The types of answers I got to my first question were along the lines of 'avalanche effect isn't needed here', 'it's only for cryptography hash functions', and 'the hashCode implementation for strings is fast and works well for small hash maps'.

Which confuses me. All data structures are fast when they're small. Wouldn't Sun provide a default hashCode function that will work well for large data sets? That's when the performance of HashMap really matters anyway, isn't it?

Or, am I missing something? Please enlighten me.

解决方案

Storing keys in contiguous buckets does not cause performance degradation. Storing keys in the same bucket (e.g., chaining) does. When using chaining to resolve hash collisions:

  • Worst-case scenario: that every hash value is the same, so all elements end up in the same bucket, in which case you get O(n) performance (assuming the chains are linked lists)
  • Best-case scenario: every hash value is different, so each element ends up in a different bucket, so you get the expected O(1) performance.

Hash codes for use in hash tables (and the like) do not need an avalanche effect.

这篇关于hashCode,实现和与HashMap的关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆