Java HashMap检测冲突 [英] Java HashMap detect collision

查看:211
本文介绍了Java HashMap检测冲突的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么方法可以检测Java Hash-map中的冲突吗?任何人都可以指出一些情况下,大量的碰撞可以发生。当然,如果你重写一个对象的哈希码,并简单地返回一个常数值冲突肯定会发生。我不是在说。我想知道在什么所有情况下,前面提到的做大量的冲突发生而不修改默认的hashcode实现。

Is there a way to detect collision in Java Hash-map ? Can any one point out some situation's where lot of collision's can take place. Of-course if you override the hashcode for an object and simply return a constant value collision is sure to occur.I'm not talking about that.I want to know in what all situations other that the previously mentioned do huge number of collisions occur without modifying the default hashcode implementation.

推荐答案

我创建了一个项目来标记这些事情: http://code.google.com/p/hashingbench/ (对于具有链接,打开寻址和bloom过滤器的哈希表)。

I have created a project to benchmark these sort of things: http://code.google.com/p/hashingbench/ (For hashtables with chaining, open-addressing and bloom filters).

除了密钥的hashCode(),您需要知道涂抹我在该项目中调用它)散列表的函数。从此列表,HashMap的拖尾函数等价于:

Apart from the hashCode() of the key, you need to know the "smearing" (or "scrambling", as I call it in that project) function of the hashtable. From this list, HashMap's smearing function is the equivalent of:

public int scramble(int h) {
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

因此,为了在HashMap中发生冲突, / em>和足够条件如下: scramble(k1.hashCode())== scramble(k2.hashCode())如果 k1.hashCode()== k2.hashCode()(否则涂抹/加扰函数不会 > be 一个函数),因此这是一个足够但不是必要的条件。

So for a collision to occur in a HashMap, the necessary and sufficient condition is the following : scramble(k1.hashCode()) == scramble(k2.hashCode()). This is always true if k1.hashCode() == k2.hashCode() (otherwise, the smearing/scrambling function wouldn't be a function), so that's a sufficient, but not necessary condition for a collision to occur.

Edit:实际上,上述必要和足够的条件应该是 compress(scramble(k1.hashCode()))== compress(scramble(k2.hashCode ())) - compress 函数接受一个整数并将其映射到 {0,...,N- 1} ,其中 N 是存储桶的数量,因此它基本上选择一个存储桶。通常,这简单地实现为 hash%N ,或者当散列表大小是2的幂时(这实际上是具有两个散列表大小的动力) ,as hash& N (更快)。 (compress是Goodrich和Tamassia用于描述此步骤的名称,位于 Java中的数据结构和算法)。

Actually, the above necessary and sufficient condition should have been compress(scramble(k1.hashCode())) == compress(scramble(k2.hashCode())) - the compress function takes an integer and maps it to {0, ..., N-1}, where N is the number of buckets, so it basically selects a bucket. Usually, this is simply implemented as hash % N, or when the hashtable size is a power of two (and that's actually a motivation for having power-of-two hashtable sizes), as hash & N (faster). ("compress" is the name Goodrich and Tamassia used to describe this step, in the Data Structures and Algorithms in Java). Thanks go to ILMTitan for spotting my sloppiness.

其他hashtable实现(ConcurrentHashMap,IdentityHashMap等)有其他需要,并使用另一个涂抹/加扰函数,所以你需要知道你在说什么。

Other hashtable implementations (ConcurrentHashMap, IdentityHashMap, etc) have other needs and use another smearing/scrambling function, so you need to know which one you're talking about.

(例如,HashMap的涂抹功能已经就位,因为人们使用HashMap与最旧类型的hashCode() HashMap的表实现没有涂抹 - 在用于选择存储桶的低位比特中有一点不同或者根本不同的对象 - 例如 new Integer(1 * 1024) new Integer(2 * 1024) *等等你可以看到,HashMap的涂抹功能最好有<强>影响低阶位)。

(For example, HashMap's smearing function was put into place because people were using HashMap with objects with the worst type of hashCode() for the old, power-of-two-table implementation of HashMap without smearing - objects that differ a little, or not at all, in their low-order bits which were used to select a bucket - e.g. new Integer(1 * 1024), new Integer(2 * 1024) *, etc. As you can see, the HashMap's smearing function tries its best to have all bits affect the low-order bits).

但是,所有这些都是在常见情况下工作得很好 - 特殊情况是继承系统的hashCode()的对象。

All of them, though, are meant to work well in common cases - a particular case is objects that inherit the system's hashCode().

PS:实际上,提示实现者插入涂抹功能的绝对丑陋的情况是Floats / Doubles的hashCode(),以及用作值的键: 1.0,2.0,3.0,4.0 ...,它们都具有相同的(零)低阶位。这是相关的旧错误报告: http://bugs.sun.com/ bugdatabase / view_bug.do?bug_id = 4669519

PS: Actually, the absolutely ugly case which prompted the implementors to insert the smearing function is the hashCode() of Floats/Doubles, and the usage as keys of values: 1.0, 2.0, 3.0, 4.0 ..., all of them having the same (zero) low-order bits. This is the related old bug report: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4669519

这篇关于Java HashMap检测冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆