如果哈希码不同,为什么HashSet允许相等的项? [英] Why does HashSet allow equal items if hashcodes are different?

查看:110
本文介绍了如果哈希码不同,为什么HashSet允许相等的项?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HashSet 类有一个< a href =http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html#add%28E%29 =nofollow> add(Object o)方法,不是从另一个类继承的。该方法的Javadoc说明如下:

The HashSet class has an add(Object o) method, which is not inherited from another class. The Javadoc for that method says the following:


如果指定的元素尚不存在,则将其添加到此集合中。更正式地说,如果此集合不包含元素 e2 ,那么<$ c $>将指定的元素 e 添加到此集合中c>(e == null?e2 == null:e.equals(e2))。如果此集合已包含该元素,则调用将保持集不变并返回 false

Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.

换句话说,如果两个对象相等,则不会添加第二个对象,并且HashSet将保持不变。但是,我发现如果对象 e e2 具有不同的哈希码,则不是这样,尽管事实是 e.equals(E2)。这是一个简单的例子:

In other words, if two objects are equal, then the second object will not be added and the HashSet will remain the same. However, I've discovered that this is not true if objects e and e2 have different hashcodes, despite the fact that e.equals(e2). Here is a simple example:

import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;

public class BadHashCodeClass {

    /**
     * A hashcode that will randomly return an integer, so it is unlikely to be the same
     */
    @Override
    public int hashCode(){
        return new Random().nextInt();
    }

    /**
     * An equal method that will always return true
     */
    @Override
    public boolean equals(Object o){
        return true;
    }

    public static void main(String... args){
        HashSet<BadHashCodeClass> hashSet = new HashSet<>();
        BadHashCodeClass instance = new BadHashCodeClass();
        System.out.println("Instance was added: " + hashSet.add(instance));
        System.out.println("Instance was added: " + hashSet.add(instance));
        System.out.println("Elements in hashSet: " + hashSet.size());

        Iterator<BadHashCodeClass> iterator = hashSet.iterator();
        BadHashCodeClass e = iterator.next();
        BadHashCodeClass e2 = iterator.next();
        System.out.println("Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): " + (e==null ? e2==null : e.equals(e2)));
    }

主要方法的结果是:

Instance was added: true
Instance was added: true
Elements in hashSet: 2
Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): true

As上面的例子清楚地表明,HashSet能够添加两个元素,其中 e.equals(e2)

As the example above clearly shows, HashSet was able to add two elements where e.equals(e2).

I我将假设这是而不是 Java中的一个错误,并且实际上有一些完全合理的解释为什么会这样。但我无法弄清楚到底是什么。我错过了什么?

I'm going to assume that this is not a bug in Java and that there is in fact some perfectly rational explanation for why this is. But I can't figure out what exactly. What am I missing?

推荐答案

我认为你真正要问的是:

I think what you're really trying to ask is:

为什么HashSet添加具有不等哈希码的对象,即使它们声称相等?

我的问题和你发布的问题之间的区别在于你假设这种行为是一个错误,因此从这个角度来看,你会感到悲伤。我认为其他海报已经完全解释了为什么这不是一个错误,但是他们没有解决潜在的问题。

The distinction between my question and the question you posted is that you're assuming this behavior is a bug, and therefore you're getting grief for coming at it from that perspective. I think the other posters have done a thoroughly sufficient job of explaining why this is not a bug, however they have not addressed the underlying question.

我会尝试这样做这里;我建议改写你的问题,以消除Java中糟糕的文档/错误的指责,这样你就可以更直接地探索为什么你正在遇到你正在看到的行为。

I will try to do so here; I would suggest rephrasing your question to remove the accusations of poor documentation / bugs in Java so you can more directly explore why you're running into the behavior you're seeing.

equals() 文档状态(强调添加):

The equals() documentations states (emphasis added):


注意,只要重写此方法,通常需要覆盖 hashCode 方法,以便维护 hashCode 方法的一般合同,该方法声明相等对象必须具有相同的哈希码

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

等于() hashCode() 不仅仅是烦人的Java sp中的怪癖ecification。它在算法优化方面提供了一些非常有价值的好处。通过假设 a.equals(b)暗示 a.hashCode()== b.hashCode()我们可以做一些基本的等价测试而无需直接调用 equals()。特别是,上面的不变量可以被转换 - a.hashCode ()!= b.hashCode()暗示 a.equals(b)将为false。

The contract between equals() and hashCode() isn't just an annoying quirk in the Java specification. It provides some very valuable benefits in terms of algorithm optimization. By being able to assume that a.equals(b) implies a.hashCode() == b.hashCode() we can do some basic equivalence tests without needing to call equals() directly. In particular, the invariant above can be turned around - a.hashCode() != b.hashCode() implies a.equals(b) will be false.

如果你看一下 HashMap HashSet 在内部使用),你' ll注意一个内部静态类条目,定义如下:

If you look at the code for HashMap (which HashSet uses internally), you'll notice an inner static class Entry, defined like so:

static class Entry<K,V> implements Map.Entry<K,V> {
  final K key;
  V value;
  Entry<K,V> next;
  int hash;
  ...
}

HashMap 存储密钥的哈希码以及密钥和值。由于哈希码在密钥存储在映射中的时间内不会发生变化(请参阅 Map 的文档,如果对象的值是,则不指定地图的行为当对象是地图中的键时,以影响等于比较的方式改变。 HashMap 可以安全地缓存此值。通过这样做,它只需要为地图中的每个键调用 hashCode()一次,而不是每次检查密钥时。

HashMap stores the key's hash code along with the key and value. Because a hash code is expected to not change over the time a key is stored in the map (see Map's documentation, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.") it is safe for HashMap to cache this value. By doing so, it only needs to call hashCode() once for each key in the map, as opposed to every time the key is inspected.

现在让我们看一下 put()的实现,我们看到这些缓存的哈希值被利用,以及上面的不变量: / p>

Now lets look at the implementation of put(), where we see these cached hashes being taken advantage of, along with the invariant above:

public V put(K key, V value) {
  ...
  int hash = hash(key);
  int i = indexFor(hash, table.length);
  for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    Object k;
    if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
      // Replace existing element and return
    }
  }
  // Insert new element
}

特别注意条件只调用 key.equals(k)如果哈希码相等并且键不是完全相同的对象,由于短路评估。通过这些方法的契约, HashMap 可以安全地跳过此调用。如果你的对象被错误地实现,那么由 HashMap 做出的这些假设不再成立,你将得到不可用的结果,包括你的集合中的重复。

In particular, notice that the conditional only ever calls key.equals(k) if the hash codes are equal and the key isn't the exact same object, due to short-circuit evaluation. By the contract of these methods, it should be safe for HashMap to skip this call. If your objects are incorrectly implemented, these assumptions being made by HashMap are no longer true, and you will get back unusable results, including "duplicates" in your set.

请注意,您的声明 HashSet ...有 add(对象o)方法,不是从另一个类继承的不太正确。虽然其父 AbstractSet 未实现此方法,但父接口 设置 ,确实指定了方法的合同。 Set 接口不关心哈希,只关注相等,因此它根据与的相等性来指定此方法的行为(e == null? e2 == null:e.equals(e2))。只要您遵循合同, HashSet 就会按照文档记录,但尽可能避免实际浪费的工作。但是,一旦违反规则, HashSet 就无法以任何有用的方式运行。

Note that your claim "HashSet ... has an add(Object o) method, which is not inherited from another class" is not quite correct. While its parent class, AbstractSet, does not implement this method, the parent interface, Set, does specify the method's contract. The Set interface is not concerned with hashes, only equality, therefore it specifies the behavior of this method in terms of equality with (e==null ? e2==null : e.equals(e2)). As long as you follow the contracts, HashSet works as documented, but avoids actually doing wasteful work whenever possible. As soon as you break the rules however, HashSet cannot be expected to behave in any useful way.

还要考虑如果您尝试使用错误实现的 Comparator TreeSet 中存储对象,您同样会看到无意义的结果。我在另一个问题中记录了一些 TreeSet 在使用不值得信任的 Comparator 时的行为示例:如何为StringBuffer类实现比较器在Java中用于TreeSet?

Consider also that if you attempted to store objects in a TreeSet with an incorrectly implemented Comparator, you would similarly see nonsensical results. I documented some examples of how a TreeSet behaves when using an untrustworthy Comparator in another question: how to implement a comparator for StringBuffer class in Java for use in TreeSet?

这篇关于如果哈希码不同,为什么HashSet允许相等的项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆