在HashMap中使用String键的错误想法? [英] Bad idea to use String key in HashMap?

查看:114
本文介绍了在HashMap中使用String键的错误想法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道String类' hashCode()方法是保证为不同的String-s生成唯一的哈希码。我看到很多使用将String键放入HashMap-s(使用默认的String hashCode()方法)。如果地图 put 将以前放置到地图上的HashMap条目与真正不同的字符串关键字置换,很多此类用法可能会导致重大的应用程序问题。

I understand that the String class' hashCode() method is not guarantied to generate unique hash codes for distinct String-s. I see a lot of usage of putting String keys into HashMap-s (using the default String hashCode() method). A lot of this usage could result in significant application issues if a map put displaced a HashMap entry that was previously put onto the map with a truely distinct String key.

在String.hashCode()返回不同的String-s的相同值的情况下,您将遇到什么可能性?密钥是String时,开发人员如何处理这个问题?

What are the odds that you will run into the scenario where String.hashCode() returns the same value for distinct String-s? How do developers work around this issue when the key is a String?

推荐答案

开发人员不必在在HashMap中发布哈希冲突以实现程序正确性。

这里有几个关键的事情要理解:

There are a couple of key things to understand here:


  1. 碰撞是散列的固有特征,它们必须是。可能的值的数量(您的情况下的字符串,但它也适用于其他类型)远大于整数范围。


  2. 哈希的每个用法都有一种处理冲突的方法,而Java Collections(包括HashMap)也不例外。


  3. 散列不涉及平等测试。的确,相等的对象必须具有相等的哈希码,但反之亦然:许多值将具有相同的哈希码。所以不要尝试使用哈希码比较作为相等的替代品。收藏没有。他们使用散列来选择一个子集合(在Java集合世界中称为一个bucket),但是他们使用.equals()来实际检查是否相等。


  4. 您不仅不必担心在集合中导致错误结果的冲突,而且对于大多数应用程序,您也通常不必担心性能 - Java散列集合做一个很好的管理哈希码的工作。


  5. 更好的是,对于您询问的(Strings作为键)的情况,您甚至不必担心哈希码本身,因为Java的String类生成了一个很好的哈希码。所以大部分提供的Java类。
  1. Collisions are an inherent feature of hashing, and they have to be. The number of possible values (Strings in your case, but it applies to other types as well) is vastly bigger than the range of integers.

  2. Every usage of hashing has a way to handle collisions, and the Java Collections (including HashMap) is no exception.

  3. Hashing is not involved in equality testing. It is true that equal objects must have equal hashcodes, but the reverse is not true: many values will have the same hashcode. So don't try using a hashcode comparison as a substitute for equality. Collections don't. They use hashing to select a sub-collection (called a bucket in the Java Collections world), but they use .equals() to actually check for equality.

  4. Not only do you not have to worry about collisions causing incorrect results in a collection, but for most applications, you also *usually* don't have to worry about performance - Java hashed Collections do a pretty good job of managing hashcodes.

  5. Better yet, for the case you asked about (Strings as keys), you don't even have to worry about the hashcodes themselves, because Java's String class generates a pretty good hashcode. So do most of the supplied Java classes.

有些更多的细节,如果你想要的话:

Some more detail, if you want it:

散列的方式(特别是在哈希集合的情况下,如Java的HashMap,这是您所问的):

The way hashing works (in particular, in the case of hashed collections like Java's HashMap, which is what you asked about) is this:


  • HashMap存储值在您收集的子集合中称为桶。这些实际上被实现为链表。这些数量有限:iirc,16默认启动,当您将更多的项目放入地图时,数字会增加。应该总是比值更多的桶。要提供一个示例,使用默认值,如果将100个条目添加到HashMap,则将有256个桶。

  • The HashMap stores the values you give it in a collection of sub-collections, called buckets. These are actually implemented as linked lists. There are a limited number of these: iirc, 16 to start by default, and the number increases as you put more items into the map. There should always be more buckets than values. To provide one example, using the defaults, if you add 100 entries to a HashMap, there will be 256 buckets.

可用作地图中的键必须能够生成一个称为哈希码的整数值。

Every value which can be used as a key in a map must be able to generate an integer value, called the hashcode.

HashMap使用这个哈希码来选择一个存储桶。最终,这意味着取整数值 modulo 桶数,但在此之前,Java的HashMap有一个内部方法(称为 hash()),它调整哈希码以减少一些已知的聚集源。

The HashMap uses this hashcode to select a bucket. Ultimately, this means taking the integer value modulo the number of buckets, but before that, Java's HashMap has an internal method (called hash()), which tweaks the hashcode to reduce some known sources of clumping.

查找一个值时,HashMap会选择存储桶,然后使用 .equals()通过线性搜索链接搜索单个元素。

When looking up a value, the HashMap selects the bucket, and then searches for the individual element by a linear search of the linked list, using .equals().

所以,你不必为了正确性而纠正碰撞,你通常不用担心关于他们的性能,如果你使用本地Java类(如String),你不必担心生成哈希码值。

So: you don't have to work around collisions for correctness, and you usually don't have to worry about them for performance, and if you're using native Java classes (like String), you don't have to worry about generating the hashcode values either.

在您必须编写自己的哈希码方法(这意味着您已经编写了一个具有复合值的类,例如名字/姓氏对)的情况,事情会稍微复杂一些。这是很有可能在这里错了,但它不是火箭科学。首先,知道这一点:为了确保正确性,唯一必须做的是确保相等的对象产生相等的哈希码。所以如果你为你的类写一个hashcode()方法,你还必须写一个equals()方法,你必须检查每个相同的值。

In the case where you do have to write your own hashcode method (which means you've written a class with a compound value, like a first name/last name pair), things get slightly more complicated. It's quite possible to get it wrong here, but it's not rocket science. First, know this: the only thing you must do in order to assure correctness is to assure that equal objects yield equal hashcodes. So if you write a hashcode() method for your class, you must also write an equals() method, and you must examine the same values in each.

可以写一个不好但正确的hashcode()方法,我的意思是它将满足等对象必须产生相等的哈希码约束,但是通过发生大量的冲突仍然表现得非常差。

It is possible to write a hashcode() method which is bad but correct, by which I mean that it would satisfy the "equal objects must yield equal hashcodes" constraint, but still perform very poorly, by having a lot of collisions.

这种经典的退化最坏的情况是编写一个简单返回一个常量值的方法(例如, 3)所有情况。这意味着每个值都将被分配到同一个桶中。

The canonical degenerate worst case of this would be to write a method which simply returns a constant value (e.g., 3) for all cases. This would mean that every value would be hashed into the same bucket.

它仍然会工作,但性能会降低到链表的性能。

It would still work, but performance would degrade to that of a linked list.

显然,你不会写这样一个可怕的hashcode()方法。如果您使用的是一个体面的IDE,它可以为您生成一个。由于StackOverflow喜欢代码,这里是上面的firstname / lastname类的代码。

Obviously, you won't write such a terrible hashcode() method. If you're using a decent IDE, it's capable of generating one for you. Since StackOverflow loves code, here's the code for the firstname/lastname class above.


public class SimpleName {
    private String firstName;
    private String lastName;
    public SimpleName(String firstName, String lastName) {
        super();
        this.firstName = firstName;
        this.lastName = lastName;
    }
    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result
                + ((firstName == null) ? 0 : firstName.hashCode());
        result = prime * result
                + ((lastName == null) ? 0 : lastName.hashCode());
        return result;
    }
    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        SimpleName other = (SimpleName) obj;
        if (firstName == null) {
            if (other.firstName != null)
                return false;
        } else if (!firstName.equals(other.firstName))
            return false;
        if (lastName == null) {
            if (other.lastName != null)
                return false;
        } else if (!lastName.equals(other.lastName))
            return false;
        return true;
    }
}

这篇关于在HashMap中使用String键的错误想法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆