HashMap resize 方法实现细节 [英] HashMap resize method implementation detail

查看:28
本文介绍了HashMap resize 方法实现细节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如标题所暗示的,这是一个关于 HashMap#resize 的实现细节的问题 - 即内部数组的大小加倍.这有点罗嗦,但我真的试图证明我已经尽力理解了这一点......

这发生在此特定存储桶/bin 中的条目以 Linked 方式存储时 - 因此具有确切的顺序并且在问题的上下文中这很重要.

通常,resize 也可以从其他地方调用,但我们只看这种情况.

假设您将这些字符串作为键放入 HashMap(右侧是 hashcode after HashMap#hash - 这是内部重新散列.)是的,这些是精心生成的,而不是随机的.

 DFHXR - 11111YSXFJ - 01111TUDDY - 11111AXVUH - 01111RUTWZ - 11111DEDUC - 01111WFCVW - 11111ZETCU - 01111GCVUR - 11111

这里有一个简单的模式需要注意——它们的最后 4 位都是相同的——这意味着当我们插入 8 个这样的键(总共 9 个)时,它们最终会出现在同一个桶中;并且在第 9 个 HashMap#put 上,将调用 resize.

因此,如果 HashMap 中当前有 8 个条目(具有上述键之一)-这意味着此映射中有 16 个存储桶,并且它们的键的最后 4 位决定了条目的位置结果.

我们放了第九个键.此时 TREEIFY_THRESHOLD 被命中并且 resize 被调用.bins 加倍为 32 并且密钥中的多一位决定该条目的去向(所以,现在是 5 位).

最终到达这段代码(当resize发生时):

 节点loHead = null, loTail = null;节点<K,V>hiHead = null, hiTail = null;节点<K,V>下一个;做 {下一个 = e.next;if ((e.hash & oldCap) == 0) {如果(loTail == null)loHead = e;别的loTail.next = e;loTail = e;}别的 {如果(hiTail == null)hiHead = e;别的hiTail.next = e;hiTail = e;}} while ((e = next) != null);如果(loTail != null){loTail.next = null;newTab[j] = loHead;}如果(hiTail != null){hiTail.next = null;newTab[j + oldCap] = hiHead;}

实际上并没有那么复杂...它所做的就是将当前 bin 拆分为将移动到其他 bin 的条目和不会移动到其他 bin 的条目垃圾箱-但肯定会留在这个垃圾箱中.

它实际上非常聪明——它是通过这段代码实现的:

 if ((e.hash & oldCap) == 0)

它的作用是检查下一位(在我们的例子中是第 5 位)是否真的为零——如果是,这意味着这个条目将保持原样;如果不是,它将在新 bin 中以 2 次方的偏移量移动.

现在最后的问题是:调整大小中的那段代码是精心制作的,以便它保留条目的顺序.

因此,在您将这 9 个键放入 HashMap 之后,顺序将是:

DFHXR ->TUDDY ->RUTWZ ->WFCVW ->GCVUR(一箱)YSXFJ ->AXVUH ->DEDUC ->ZETCU(另一个垃圾箱)

为什么要保留 HashMap 中某些条目的顺序.Map 中的顺序真的很糟糕此处这里.

解决方案

设计注意事项已记录在同一源文件中,位于 第211行

<块引用>

* 当 bin 列表被树化、分裂或未树化时,我们保持* 它们以相同的相对访问/遍历顺序(即字段* Node.next) 更好地保留局部性,并稍微* 简化对调用的拆分和遍历的处理* 迭代器.remove.在插入时使用比较器时,要保持* 总订购量(或尽可能接近此处的要求)* 重新平衡,我们将类和 identityHashCodes 比较为* 决胜局.

由于通过迭代器删除映射不能触发调整大小,因此在 resize 中专门保留顺序的原因是更好地保留局部性,并稍微简化拆分的处理",以及在政策方面保持一致.

As the title suggests this is a question about an implementation detail from HashMap#resize - that's when the inner array is doubled in size. It's a bit wordy, but I've really tried to prove that I did my best understanding this...

This happens at a point when entries in this particular bucket/bin are stored in a Linked fashion - thus having an exact order and in the context of the question this is important.

Generally the resize could be called from other places as well, but let's look at this case only.

Suppose you put these strings as keys in a HashMap (on the right there's the hashcode after HashMap#hash - that's the internal re-hashing.) Yes, these are carefully generated, not random.

 DFHXR - 11111
 YSXFJ - 01111 
 TUDDY - 11111 
 AXVUH - 01111 
 RUTWZ - 11111
 DEDUC - 01111
 WFCVW - 11111
 ZETCU - 01111
 GCVUR - 11111 

There's a simple pattern to notice here - the last 4 bits are the same for all of them - which means that when we insert 8 of these keys (there are 9 total), they will end-up in the same bucket; and on the 9-th HashMap#put, the resize will be called.

So if currently there are 8 entries (having one of the keys above) in the HashMap - it means there are 16 buckets in this map and the last 4 bits of they key decided where the entries end-up.

We put the nine-th key. At this point TREEIFY_THRESHOLD is hit and resize is called. The bins are doubled to 32 and one more bit from the keys decides where that entry will go (so, 5 bits now).

Ultimately this piece of code is reached (when resize happens):

 Node<K,V> loHead = null, loTail = null;
 Node<K,V> hiHead = null, hiTail = null;
 Node<K,V> next;
 do {
     next = e.next;
     if ((e.hash & oldCap) == 0) {
          if (loTail == null)
               loHead = e;
          else
               loTail.next = e;
          loTail = e;
     }
     else {
        if (hiTail == null)
            hiHead = e;
        else
            hiTail.next = e;
        hiTail = e;
     }
 } while ((e = next) != null);



 if (loTail != null) {
     loTail.next = null;
     newTab[j] = loHead;
 }
 if (hiTail != null) {
     hiTail.next = null;
     newTab[j + oldCap] = hiHead;
 }

It's actually not that complicated... what it does it splits the current bin into entries that will move to other bins and to entries that will not move to other bins - but will remain into this one for sure.

And it's actually pretty smart how it does that - it's via this piece of code:

 if ((e.hash & oldCap) == 0) 

What this does is check if the next bit (the 5-th in our case) is actually zero - if it is, it means that this entry will stay where it is; if it's not it will move with a power of two offset in the new bin.

And now finally the question: that piece of code in the resize is carefully made so that it preserves the order of the entries there was in that bin.

So after you put those 9 keys in the HashMap the order is going to be :

DFHXR -> TUDDY -> RUTWZ -> WFCVW -> GCVUR (one bin)

YSXFJ -> AXVUH -> DEDUC -> ZETCU (another bin)

Why would you want to preserve order of some entries in the HashMap. Order in a Map is really bad as detailed here or here.

解决方案

The design consideration has been documented within the same source file, in a code comment in line 211

* When bin lists are treeified, split, or untreeified, we keep 
* them in the same relative access/traversal order (i.e., field 
* Node.next) to better preserve locality, and to slightly 
* simplify handling of splits and traversals that invoke 
* iterator.remove. When using comparators on insertion, to keep a 
* total ordering (or as close as is required here) across 
* rebalancings, we compare classes and identityHashCodes as 
* tie-breakers. 

Since removing mappings via an iterator can’t trigger a resize, the reasons to retain the order specifically in resize are "to better preserve locality, and to slightly simplify handling of splits", as well as being consistent regarding the policy.

这篇关于HashMap resize 方法实现细节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆