HashMap Java 8 实现 [英] HashMap Java 8 implementation

查看:34
本文介绍了HashMap Java 8 实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据以下链接文档:

如果我们的 hashCode 函数是好的,它应该提供一个均匀的分布,这样所有的桶都会在某种程度上被平等地使用.在这种情况下,存储桶使用链表来存储值.

但是你不能依靠人来实现好的散列函数.人们经常会编写糟糕的散列函数,这将导致分布不均.我们也有可能只是因为我们的输入而走运.

这种分布越不均匀,我们离 O(1) 操作越远,离 O(n) 操作越近.

如果桶变得太大,HashMap 的实现试图通过将一些桶组织成树而不是链表来缓解这种情况.这就是 TREEIFY_THRESHOLD = 8 的用途.如果一个桶包含超过 8 个项目,它应该变成一棵树.

这棵树是红黑树,大概是因为它提供了一些最坏情况的保证.它首先按哈希码排序.如果哈希码相同,如果对象实现该接口,则使用 ComparablecompareTo 方法,否则使用身份哈希码.

如果从地图中删除条目,存储桶中的条目数量可能会减少,从而不再需要此树结构.这就是 UNTREEIFY_THRESHOLD = 6 的用途.如果桶中的元素数量低于 6,我们不妨回到使用链表.

最后,MIN_TREEIFY_CAPACITY = 64.

当哈希映射的大小增加时,它会自动调整自身大小以拥有更多存储桶.如果我们有一个小的 HashMap,我们得到非常满的桶的可能性非常高,因为我们没有那么多不同的桶可以放入东西.拥有更大的 HashMap 更好,更多的桶不那么满.这个常量基本上是说如果我们的 HashMap 非常小,不要开始将桶变成树 - 它应该首先调整大小以变大.


为了回答您关于性能提升的问题,添加了这些优化以改善最坏的情况.如果您的 hashCode 函数不是很好,您可能只会看到由于这些优化而引起的显着性能提升.

它旨在防止错误的hashCode 实现,还提供基本的保护以防止碰撞攻击,其中不良行为者可能会故意选择占用相同存储桶的输入来尝试减慢系统速度.

As per the following link document: Java HashMap Implementation

I'm confused with the implementation of HashMap (or rather, an enhancement in HashMap). My queries are:

Firstly

static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;
static final int MIN_TREEIFY_CAPACITY = 64;

Why and how are these constants used? I want some clear examples for this. How they are achieving a performance gain with this?

Secondly

If you see the source code of HashMap in JDK, you will find the following static inner class:

static final class TreeNode<K, V> extends java.util.LinkedHashMap.Entry<K, V> {
    HashMap.TreeNode<K, V> parent;
    HashMap.TreeNode<K, V> left;
    HashMap.TreeNode<K, V> right;
    HashMap.TreeNode<K, V> prev;
    boolean red;

    TreeNode(int arg0, K arg1, V arg2, HashMap.Node<K, V> arg3) {
        super(arg0, arg1, arg2, arg3);
    }

    final HashMap.TreeNode<K, V> root() {
        HashMap.TreeNode arg0 = this;

        while (true) {
            HashMap.TreeNode arg1 = arg0.parent;
            if (arg0.parent == null) {
                return arg0;
            }

            arg0 = arg1;
        }
    }
    //...
}

How is it used? I just want an explanation of the algorithm.

解决方案

HashMap contains a certain number of buckets. It uses hashCode to determine which bucket to put these into. For simplicity's sake imagine it as a modulus.

If our hashcode is 123456 and we have 4 buckets, 123456 % 4 = 0 so the item goes in the first bucket, Bucket 1.

If our hashCode function is good, it should provide an even distribution so that all the buckets will be used somewhat equally. In this case, the bucket uses a linked list to store the values.

But you can't rely on people to implement good hash functions. People will often write poor hash functions which will result in a non-even distribution. It's also possible that we could just get unlucky with our inputs.

The less even this distribution is, the further we're moving from O(1) operations and the closer we're moving towards O(n) operations.

The implementation of HashMap tries to mitigate this by organising some buckets into trees rather than linked lists if the buckets become too large. This is what TREEIFY_THRESHOLD = 8 is for. If a bucket contains more than eight items, it should become a tree.

This tree is a Red-Black tree, presumably chosen because it offers some worst-case guarantees. It is first sorted by hash code. If the hash codes are the same, it uses the compareTo method of Comparable if the objects implement that interface, else the identity hash code.

If entries are removed from the map, the number of entries in the bucket might reduce such that this tree structure is no longer necessary. That's what the UNTREEIFY_THRESHOLD = 6 is for. If the number of elements in a bucket drops below six, we might as well go back to using a linked list.

Finally, there is the MIN_TREEIFY_CAPACITY = 64.

When a hash map grows in size, it automatically resizes itself to have more buckets. If we have a small HashMap, the likelihood of us getting very full buckets is quite high, because we don't have that many different buckets to put stuff into. It's much better to have a bigger HashMap, with more buckets that are less full. This constant basically says not to start making buckets into trees if our HashMap is very small - it should resize to be larger first instead.


To answer your question about the performance gain, these optimisations were added to improve the worst case. You would probably only see a noticeable performance improvement because of these optimisations if your hashCode function was not very good.

It is designed to protect against bad hashCode implementations and also provides basic protection against collision attacks, where a bad actor may attempt to slow down a system by deliberately selecting inputs which occupy the same buckets.

这篇关于HashMap Java 8 实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆