如果您更改元素的标识,则哈希集不会使元素保持唯一 [英] HashSets don't keep the elements unique if you mutate their identity

查看:59
本文介绍了如果您更改元素的标识,则哈希集不会使元素保持唯一的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在C#中使用 HashSets 时,我最近遇到了一个令人讨厌的问题: HashSets 不保证唯一性元素;他们不是集合。他们所保证的是,如果调用 item.equals(那个) true 。如果您操作集中的项目,则不再适用。演示一个小程序(来自我的Linqpad的copypasta):

When working with HashSets in C#, I recently came across an annoying problem: HashSets don't guarantee unicity of the elements; they are not Sets. What they do guarantee is that when Add(T item) is called the item is not added if for any item in the set item.equals(that) is true. This holds no longer if you manipulate items already in the set. A small program that demonstrates (copypasta from my Linqpad):

void Main()
{
    HashSet<Tester> testset = new HashSet<Tester>();
    testset.Add(new Tester(1));
    testset.Add(new Tester(2));
    foreach(Tester tester in testset){
      tester.Dump();
    }
    foreach(Tester tester in testset){
      tester.myint = 3;
    }
    foreach(Tester tester in testset){
      tester.Dump();
    }
    HashSet<Tester> secondhashset = new HashSet<Tester>(testset);
    foreach(Tester tester in secondhashset){
      tester.Dump();
    }
}

class Tester{
  public int myint;

  public Tester(int i){
    this.myint = i;
  }

  public override bool Equals(object o){
    if (o== null) return false;
    Tester that = o as Tester;
    if (that == null) return false;
    return (this.myint == that.myint);
  }

  public override int GetHashCode(){
    return this.myint;
  }

  public override string ToString(){
    return this.myint.ToString();
  }
}

它将很高兴地将集合中的项目操作为等于,仅在构建新的HashSet时将其过滤掉。当我想使用需要知道条目唯一的集合时,什么是可行的?自己动手,在其中Add(T item)从项目中添加一个副本,并且枚举器枚举所包含项目的副本吗?这就提出了一个挑战,即每个包含的元素都应该是可深度复制的,至少在影响其相等性的项目中应该是可复制的。

It will happily manipulate the items in the collection to be equal, only filtering them out when a new HashSet is built. What is advicible when I want to work with sets where I need to know the entries are unique? Roll my own, where Add(T item) adds a copy off the item, and the enumerator enumerates over copies of the contained items? This presents the challenge that every contained element should be deep-copyable, at least in its items that influence it's equality.

另一种解决方案是将自己滚动,并且仅接受实现INotifyPropertyChanged的元素,并对该事件采取措施以重新检查是否相等,但这似乎是严格的限制,更不用说引擎盖下的大量工作和性能损失了。

Another solution would be to roll your own, and only accepts elements that implement INotifyPropertyChanged, and taking action on the event to re-check for equality, but this seems severely limiting, not to mention a whole lot of work and performance loss under the hood.

我想到的另一种可能的解决方案是确保构造函数中的所有字段均为只读或const。所有解决方案似乎都有很大的缺点。我还有其他选择吗?

Yet another possible solution I thought of is making sure that all fields are readonly or const in the constructor. All solutions seem to have very large drawbacks. Do I have any other options?

推荐答案

您实际上是在谈论对象身份。如果您要对项目进行散列,则它们需要具有某种身份,以便可以进行比较。

You're really talking about object identity. If you're going to hash items they need to have some kind of identity so they can be compared.


  • 如果更改,则它不是有效的标识方法。您当前具有 public int myint 。实际上,它应该是 readonly ,并且只能在构造函数中设置。

  • 如果两个对象在概念上不同(即,您要对待它们

  • 如果您有两个具有相同内容的对象(即,两个具有相同字段值的值对象),则它们应该具有相同的哈希值。具有相同的哈希码,并且应该相等。

  • 如果您的数据模型表明您可以具有两个内容相同的对象,但它们不能相等,则应使用代理ID ,而不是对内容进行散列。

  • 也许您的对象应该是不可变的值类型,所以对象不能更改

  • 如果它们是可变类型,则您应该分配一个代理ID(即从外部引入的代理ID,例如递增的计数器ID或使用对象的哈希码),该ID对于给定的对象永远不会更改

  • If that changes, it is not a valid identity method. You currently have public int myint. It really should be readonly, and only set in the constructor.
  • If two objects are conceptually different (i.e. you want to treat them as different in your specific design) then their hash code should be different.
  • If you have two objects with the same content (i.e. two value objects that have the same field values) then they should have the same hash codes and should be equal.
  • If your data model says that you can have two objects with the same content but they can't be equal, you should use a surrogate id, not hash the contents.
  • Perhaps your objects should be immutable value types so the object can't change
  • If they are mutable types, you should assign a surrogate ID (i.e. one that is introduced externally, like an increasing counter id or using the object's hashcode) that never changes for the given object

这是您的 Tester 对象而不是集合的问题。您需要认真考虑如何定义身份。这不是一个简单的问题。

This is a problem with your Tester objects, not the set. You need to think hard about how you define identity. It's not an easy problem.

这篇关于如果您更改元素的标识,则哈希集不会使元素保持唯一的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆