实现-hash / -isEqual:/ -isEqualTo ...:用于Objective-C集合 [英] Implementing -hash / -isEqual: / -isEqualTo...: for Objective-C collections

查看:261
本文介绍了实现-hash / -isEqual:/ -isEqualTo ...:用于Objective-C集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:以下SO问题是相关的,但它们和关联资源似乎完全不能完全回答我的问题,特别是关于实现对象集合的等式测试

Note: The following SO questions are related, but neither they nor the linked resources seem to fully answer my questions, particularly in relation to implementing equality tests for collections of objects.

  • Best practices for overriding -isEqual: and -hash
  • Techniques for implementing -hash on mutable Cocoa objects

NSObject提供默认实现/ intfm / NSObject / hashrel =noreferrer> -hash (返回实例的地址,如(NSUInteger )self )和 -isEqual: (它返回 NO ,除非接收者的地址和参数是相同的)。这些方法被设计为在必要时被覆盖,但文档清楚地表明您应该提供两者或两者。此外,如果 -isEqual:为两个对象返回 YES ,则 -hash的结果对于那些对象必须是一样的。如果没有,那么当对象应该是相同的时候会出现问题 - 比如两个字符串实例,其中 -compare:返回 NSOrderedSame - 被添加到Cocoa集合或直接比较。

NSObject provides default implementations of -hash (which returns the address of the instance, like (NSUInteger)self) and -isEqual: (which returns NO unless the addresses of the receiver and the parameter are identical). These methods are designed to be overridden as necessary, but the documentation makes it clear that you should provide both or neither. Further, if -isEqual: returns YES for two objects, then the result of -hash for those objects must be the same. If not, problems can ensue when objects that should be the same — such as two string instances for which -compare: returns NSOrderedSame — are added to a Cocoa collection or compared directly.

我开发 CHDataStructures.framework ,一个Objective-C数据结构的开源库。我已经实施了一些集合,目前正在改进和增强其功能。我想添加的其中一个功能是比较集合与另一个功能的能力。

I develop CHDataStructures.framework, an open-source library of Objective-C data structures. I have implemented a number of collections, and am currently refining and enhancing their functionality. One of the features I want to add is the ability to compare collections for equality with another.

这些比较应该考虑存在于两个集合(包括订购,如果适用)。这种方法在Cocoa中有一个先例,通常使用一个单独的方法,包括以下内容:

Rather than comparing only memory addresses, these comparisons should consider the objects present in the two collections (including ordering, if applicable). This approach has quite a precedent in Cocoa, and generally uses a separate method, including the following:

  • -[NSArray isEqualToArray:]
  • -[NSDate isEqualToDate:]
  • -[NSDictionary isEqualToDictionary:]
  • -[NSNumber isEqualToNumber:]
  • -[NSSet isEqualToSet:]
  • -[NSString isEqualToString:]
  • -[NSValue isEqualToValue:]

我想让我的自定义集合对于平等的测试非常有效,所以他们可以安全(可预测)是添加到其他集合,并允许其他(如NSSet)确定两个集合是否相等/等效/重复。

I want to make my custom collections robust to tests of equality, so they may safely (and predictably) be added to other collections, and allow others (like an NSSet) to determine whether two collections are equal/equivalent/duplicates.

一个 -isEqualTo ...:方法本身很好,但定义这些方法的类通常也会覆盖 -isEqual :如果参数与接收者具有相同的类(或者可能是子类),则可以调用 [self isEqualTo ...:] code> [super isEqual:] 否则。这意味着该类还必须定义 -hash ,以便为具有相同内容的不同实例返回相同的值。

An -isEqualTo...: method works great on its own, but classes which define these methods usually also override -isEqual: to invoke [self isEqualTo...:] if the parameter is of the same class (or perhaps subclass) as the receiver, or [super isEqual:] otherwise. This means the class must also define -hash such that it will return the same value for disparate instances that have the same contents.

此外,Apple的 -hash 的文档规定了以下内容:(强调我的)

In addition, Apple's documentation for -hash stipulates the following: (emphasis mine)


如果将可变对象添加到使用哈希值确定对象在集合中的位置的集合中,则对象的散列方法返回的值不能更改,而对象位于因此,哈希方法不能依赖任何对象的内部状态信息,您必须确保对象的内部状态信息不会改变,而对象是因此,例如,一个可变字典可以放在一个哈希表中,但是在它不在的地方你不能改变它(注意一个给定的对象是否在一个集合中是很难知道的) 。 )

"If a mutable object is added to a collection that uses hash values to determine the object's position in the collection, the value returned by the hash method of the object must not change while the object is in the collection. Therefore, either the hash method must not rely on any of the object's internal state information or you must make sure the object's internal state information does not change while the object is in the collection. Thus, for example, a mutable dictionary can be put in a hash table but you must not change it while it is in there. (Note that it can be difficult to know whether or not a given object is in a collection.)"

编辑: 我明白为什么这是必要的同意推理 - 我在这里提到了提供额外的背景,并为了简洁起见,为什么是这样的话题。

我的集合是可变的,并且哈希将至少考虑一些的内容,所以这里唯一的选择是考虑一个编程错误来突变存储在另一个集合中的集合。 (我的收藏集都采用 NSCopying ,因此像NSDictionary这样的集合可以成功地将副本用作密钥等)

All of my collections are mutable, and the hash will have to consider at least some of the contents, so the only option here is to consider it a programming error to mutate a collection stored in another collection. (My collections all adopt NSCopying, so collections like NSDictionary can successfully make a copy to use as a key, etc.)

我有理由实现 -isEqual: -hash ,因为(例如)我的一个类的间接用户可能不知道具体的 -isEqualTo ...: code>方法调用,甚至关心两个对象是否是同一个类的实例。他们应该可以在类型为 id的任何变量上调用 -isEqual: -hash ,并获得预期的结果。

It makes sense for me to implement -isEqual: and -hash, since (for example) an indirect user of one of my classes may not know the specific -isEqualTo...: method to call, or even care whether two objects are instances of the same class. They should be able to call -isEqual: or -hash on any variable of type id and get the expected result.

-isEqual:(可以访问两个实例被比较), -hash 必须盲目地返回结果,只能访问特定实例中的数据。 由于不知道使用了哈希,所以结果必须一致,因为所有可能的实例应该被视为相等/相同,并且必须始终与<$ c $一致c> -isEqual: 。 另外,编写好的散列函数是不平凡的 - 保证唯一性是一个挑战,尤其是当你只有一个

Unlike -isEqual: (which has access to two instances being compared), -hash must return a result "blindly", with access only to the data within a particular instance. Since it can't know what the hash is being used for, the result must be consistent for all possible instances that should be considered equal/identical, and must always agree with -isEqual:. ( This has been debunked by the answers below, and it certainly makes life easier.) Further, writing good hash functions is non-trivial — guaranteeing uniqueness is a challenge, especially when you only have an NSUInteger (32/64 bits) in which to represent it.


  1. 对于集合实施平等比较 -hash 是否有最佳做法?

  2. 是否有任何特殊性在Objective-C和Cocoa-esque系列中的计划?

  3. 是否有任何良好的单元测试方法 -hash 具有合理的学位有信心的

  4. 有关实现 -hash 以同意 -isEqual:用于包含任意类型元素的集合?我应该知道什么陷阱? (编辑:不像我第一想到的那样有问题 - 正如 @kperryua 指出,等于 -hash 暗示 -isEqual:。)

  1. Are there best practices when implementing equality comparisons -hash for collections?
  2. Are there any peculiarities to plan for in Objective-C and Cocoa-esque collections?
  3. Are there any good approaches for unit testing -hash with a reasonable degree of confidence?
  4. Any suggestions on implementing -hash to agree with -isEqual: for collections containing elements of arbitrary types? What pitfalls should I know about? ( Not as problematic as I first thought — as @kperryua points out, "equal -hash values do not imply -isEqual:".)






编辑: 我应该澄清一点,我不会对如何实现-isEqual:或-isEqualTo ...感到困惑:收藏这很简单。我认为我的困惑主要来自于(错误地)认为 - 如果-isEqual:返回NO,则必须返回一个不同的值。过去做过密码学,我在想,不同价值观的散列必须是不一样的。然而,下面的答案让我意识到,一个好的散列函数真的是关于最小化桶碰撞和链接使用 -hash 的集合。虽然独特的散列是最好的,但它们并不是一个严格的要求。


I should have clarified that I'm not confused about how to implement -isEqual: or -isEqualTo...: for collections, that's straightforward. I think my confusion stemmed mainly from (mistakenly) thinking that -hash MUST return a different value if -isEqual: returns NO. Having done cryptography in the past, I was thinking that hashes for different values MUST be different. However, the answers below made me realize that a "good" hash function is really about minimizing bucket collisions and chaining for collections that use -hash. While unique hashes are preferable, they are not a strict requirement.

推荐答案

我想试着想出一些通常有用的哈希函数将为集合生成唯一的哈希值是无用的练习。 U62的所有内容的散列组合的建议将不会很好地扩展,因为它使哈希函数O(n)。哈希函数应该是O(1),以确保良好的性能,否则哈希的目的被击败。 (考虑普通的Cocoa构造的plists,它们是包含数组和其他字典的字典,可能会恶化。如果集合的哈希函数是O(如果集合的哈希函数是O()),尝试使用大型plist的顶级字典的散列将是非常缓慢的n)。)

I think trying to come up with some generally useful hash function that will generate unique hash values for collections is an exercise in futility. U62's suggestion of combining the hashes of all the contents will not scale well, as it makes the hash function O(n). Hash functions should really be O(1) to ensure good performance, otherwise the purpose of the hash is defeated. (Consider the common Cocoa construct of plists, which are dictionaries containing arrays and other dictionaries, potentially ad nauseum. Attempting to take the hash of the top-level dictionary of a large plist would be excruciatingly slow if the collections' hash functions were O(n).)

我的建议不用担心收藏的哈希很多。如你所说, -isEqual:意味着相等的 -hash 值。另一方面,等于 -hash 意味着 -isEqual:。这个事实给你很大的空间来创建一个简单的哈希。

My suggestion would be not to worry a great deal about a collection's hash. As you stated, -isEqual: implies equal -hash values. On the other hand, equal -hash values do not imply -isEqual:. That fact gives you a lot of leeway to create a simple hash.

如果你真的担心碰撞(而且你有具体的测量结果证实这是值得关注的现实情况),您仍然可以在一定程度上遵循U62的建议。例如,您可以使用集合中第一个和/或最后一个元素的散列,并将其与(例如)集合的 -count 组合起来。这足以提供一个体面的哈希。

If you're really worried about collisions though (and you have proof in concrete measurements of real-world situations that confirm it is something to be worried about), you could still follow U62's advice to some degree. For example, you could take the hash of, say, the first and/or last element in the collection, and combine that with, say, the -count of the collection. That be enough to provide a decent hash.

我希望答案至少有一个你的问题。

I hope that answers at least one of your questions.

至于第1号:实施 -isEqual:相当干燥。您列举的内容,并检查isEqual:在每个元素。

As for No. 1: Implementing -isEqual: is pretty cut and dry. You enumerate the contents, and check isEqual: on each of the elements.

有一件事要小心,这可能会影响你决定为您的收藏品做什么 -hash 函数。您的集合的客户还必须了解 -isEqual: -hash 的规则。如果您在集合的 -hash 中使用内容 -hash ,如果内容 isEqual: -hash 不同意。当然,这是客户的错,但另一种观点是反对将您的 -hash 根据收藏的内容。

There is one thing to be careful of that may affect what you decide to do for your collections' -hash functions. Clients of your collections must also understand the rules governing -isEqual: and -hash. If you use the contents' -hash in your collection's -hash, your collection will break if the contents' isEqual: and -hash don't agree. It's the client's fault, of course, but that's another argument against basing your -hash off of the collection's contents.

不。 2有点模糊。不知道你在想什么。

No. 2 is kind of vague. Not sure what you have in mind there.

这篇关于实现-hash / -isEqual:/ -isEqualTo ...:用于Objective-C集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆