实现-hash / -isEqual:/ -isEqualTo ...:用于Objective-C集合 [英] Implementing -hash / -isEqual: / -isEqualTo...: for Objective-C collections

查看:281
本文介绍了实现-hash / -isEqual:/ -isEqualTo ...:用于Objective-C集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请注意:以下SO问题是相关的,但它们或链接的资源都不能完全回答我的问题,特别是对于实现对象集合的等式测试








背景



NSObject提供默认实现/ intfm / NSObject / hashrel =nofollow noreferrer> -hash (它返回实例的地址,例如 NSUInteger)self )和 -isEqual: (返回 NO ,除非接收器的地址和参数相同)。这些方法被设计为在必要时被覆盖,但文档清楚地表明您应该提供两者,或两者都不提供。此外,如果 -isEqual:为两个对象返回 YES ,则结果 -hash 必须。如果没有,当对象应该是相同的 - 例如两个字符串实例 -compare:返回 NSOrderedSame

上下文



我开发 CHDataStructures.framework ,一个Objective-C数据结构的开源库。我实现了一些集合,并且目前正在改进和增强其功能。



我想添加的一个特性是比较集合和另一个集合是否相等。比较只比较内存地址,这些比较应该考虑两个集合(包括排序,如果适用)。这种方法在Cocoa中有很多先例,通常使用单独的方法,包括:




  • - [NSArray isEqualToArray:]

  • - [NSDate isEqualToDate:]

  • - [NSDictionary isEqualToDictionary:]

  • - [NSNumber isEqualToNumber:]

  • [NSSet isEqualToSet:]

  • - [NSString isEqualToString:] / a>

  • - [NSValue isEqualToValue:]



    • 我想让我的自定义集合对平等测试更加稳健,所以他们可以安全地(可预见地)添加到其他集合,并允许其他集合(如NSSet)



      问题



      -isEqualTo ...:方法本身就很好用,但定义这些方法的类通常也重写 -isEqual:来调用 [self isEqualTo ...:] 如果参数是与接收者相同的类(或可能是子类),或 [super isEqual:] 否则。这意味着类还必须定义 -hash ,以便它对具有相同内容的不同实例返回相同的值。



      此外,Apple的 -hash 文档规定如下:


      如果一个可变对象被添加到使用哈希值来确定对象在集合中的位置的集合中,则对象的哈希方法返回的值不能改变,而对象在因此, 哈希方法不能依赖任何对象的内部状态信息,必须确保对象的内部状态信息不会改变,而对象因此,例如,一个可变字典可以放在一个哈希表中,但是你不能在它在那里改变它(注意,可能很难知道一个给定的对象是否在一个集合。)


      编辑:我绝对明白为什么这是必要的,完全同意推理 - 我在这里提到它提供额外的上下文,并且为了简洁起见,讨论了为什么会出现这种情况。



      的集合是可变的,并且散列将必须至少考虑一些内容,因此这里唯一的选择是认为它是一个编程错误,以突变另一个集合中存储的集合。 (我的收藏集全部采用 NSCopying ,因此像NSDictionary



      我可以实现 -isEqual: -hash ,因为(例如)我的一个类的间接用户可能不知道具体 -isEqualTo ...: / code>方法调用,甚至关心两个对象是否是同一个类的实例。他们应该能够在类型 id的任何变量上调用 -isEqual: -hash 并获得预期结果。



      -isEqual:两个实例被比较), -hash 必须返回一个结果blindly,只能访问特定实例中的数据。 由于无法知道要使用的哈希值,对于应被视为相等/相同的所有实例,结果必须一致,并且必须始终与 -isEqual: 此外,编写好的哈希函数是非常简单的 - 保证唯一性是一个挑战,特别是当你只有一个用来表示它的NSUInteger(32/64位)。



      问题




      1. 对于集合实施平等比较 -hash 时,是否有最佳实践?

      2. 是否有任何特殊性计划在Objective-C和Cocoa-esque集合?

      3. 有合理的程度有单位测试的好方法 -hash -hash 同意 -isEqual:用于包含任意类型元素的集合?我应该知道什么陷阱? (修改:不像我最初想的那样有问题,因为 @kperryua 指出,等于 -hash 暗示 -isEqual:。)






      编辑:我应该澄清,我不会对如何实现-isEqual:或-isEqualTo ...: ,这很直接。我认为我的混乱主要源于(错误地)认为 - 如果-isEqual:返回NO,则必须返回不同的值。在过去进行了密码学之后,我认为不同值的哈希必须不同。但是,下面的答案让我意识到,一个好的哈希函数真的关于最小化桶碰撞和链接使用 -hash 的集合。虽然唯一的哈希是首选,但它们不是严格的要求。


      解决方案

      一般有用的哈希函数将为集合生成唯一的哈希值是一种无用的练习。 U62的组合所有内容的哈希的建议将不能很好地缩放,因为它使得散列函数O(n)。哈希函数应该真的是O(1)以确保良好的性能,否则哈希的目的被打败。 (考虑普通的Cocoa构造的plists,它是包含数组和其他字典的字典,潜在的恶意)如果集合的哈希函数是O(),则尝试获取大型plist的顶级字典的哈希将是非常慢的n)。)



      我的建议是不要担心一个集合的哈希很大。正如你所说, -isEqual:意味着等于 -hash 值。另一方面,等于 -hash 的值意味着 -isEqual:。这个事实给你很多余地创建一个简单的哈希。



      如果你真的担心碰撞虽然(和你有证据在具体测量的现实世界的情况,确认它是一个担心),你仍然可以在一定程度上遵循U62的建议。例如,您可以使用集合中的第一个和/或最后一个元素的散列,并将其与例如集合的 -count 结合使用。



      我希望至少能回答你的一个问题。



      至于第1:实现 -isEqual:是相当干燥。你枚举的内容,并检查isEqual:对每个元素。



      有一件事要注意,可能会影响你决定为你的集合做什么, -hash 函数。您的集合的客户端还必须了解管理 -isEqual: -hash 的规则。如果您在集合的 -hash 中使用内容' -hash ,您的集合将会中断,如果内容 isEqual: -hash 不同意。这是客户的错,当然,但这是另一个反对根据你的 -hash 集合的内容。



      否。 2是模糊的。不确定您的想法。


      Note: The following SO questions are related, but neither they nor the linked resources seem to fully answer my questions, particularly in relation to implementing equality tests for collections of objects.


      Background

      NSObject provides default implementations of -hash (which returns the address of the instance, like (NSUInteger)self) and -isEqual: (which returns NO unless the addresses of the receiver and the parameter are identical). These methods are designed to be overridden as necessary, but the documentation makes it clear that you should provide both or neither. Further, if -isEqual: returns YES for two objects, then the result of -hash for those objects must be the same. If not, problems can ensue when objects that should be the same — such as two string instances for which -compare: returns NSOrderedSame — are added to a Cocoa collection or compared directly.

      Context

      I develop CHDataStructures.framework, an open-source library of Objective-C data structures. I have implemented a number of collections, and am currently refining and enhancing their functionality. One of the features I want to add is the ability to compare collections for equality with another.

      Rather than comparing only memory addresses, these comparisons should consider the objects present in the two collections (including ordering, if applicable). This approach has quite a precedent in Cocoa, and generally uses a separate method, including the following:

      I want to make my custom collections robust to tests of equality, so they may safely (and predictably) be added to other collections, and allow others (like an NSSet) to determine whether two collections are equal/equivalent/duplicates.

      Problems

      An -isEqualTo...: method works great on its own, but classes which define these methods usually also override -isEqual: to invoke [self isEqualTo...:] if the parameter is of the same class (or perhaps subclass) as the receiver, or [super isEqual:] otherwise. This means the class must also define -hash such that it will return the same value for disparate instances that have the same contents.

      In addition, Apple's documentation for -hash stipulates the following: (emphasis mine)

      "If a mutable object is added to a collection that uses hash values to determine the object's position in the collection, the value returned by the hash method of the object must not change while the object is in the collection. Therefore, either the hash method must not rely on any of the object's internal state information or you must make sure the object's internal state information does not change while the object is in the collection. Thus, for example, a mutable dictionary can be put in a hash table but you must not change it while it is in there. (Note that it can be difficult to know whether or not a given object is in a collection.)"

      Edit: I definitely understand why this is necessary and totally agree with the reasoning — I mentioned it here to provide additional context, and skirted the topic of why it's the case for the sake of brevity.

      All of my collections are mutable, and the hash will have to consider at least some of the contents, so the only option here is to consider it a programming error to mutate a collection stored in another collection. (My collections all adopt NSCopying, so collections like NSDictionary can successfully make a copy to use as a key, etc.)

      It makes sense for me to implement -isEqual: and -hash, since (for example) an indirect user of one of my classes may not know the specific -isEqualTo...: method to call, or even care whether two objects are instances of the same class. They should be able to call -isEqual: or -hash on any variable of type id and get the expected result.

      Unlike -isEqual: (which has access to two instances being compared), -hash must return a result "blindly", with access only to the data within a particular instance. Since it can't know what the hash is being used for, the result must be consistent for all possible instances that should be considered equal/identical, and must always agree with -isEqual:. (Edit: This has been debunked by the answers below, and it certainly makes life easier.) Further, writing good hash functions is non-trivial — guaranteeing uniqueness is a challenge, especially when you only have an NSUInteger (32/64 bits) in which to represent it.

      Questions

      1. Are there best practices when implementing equality comparisons -hash for collections?
      2. Are there any peculiarities to plan for in Objective-C and Cocoa-esque collections?
      3. Are there any good approaches for unit testing -hash with a reasonable degree of confidence?
      4. Any suggestions on implementing -hash to agree with -isEqual: for collections containing elements of arbitrary types? What pitfalls should I know about? (Edit: Not as problematic as I first thought — as @kperryua points out, "equal -hash values do not imply -isEqual:".)


      Edit: I should have clarified that I'm not confused about how to implement -isEqual: or -isEqualTo...: for collections, that's straightforward. I think my confusion stemmed mainly from (mistakenly) thinking that -hash MUST return a different value if -isEqual: returns NO. Having done cryptography in the past, I was thinking that hashes for different values MUST be different. However, the answers below made me realize that a "good" hash function is really about minimizing bucket collisions and chaining for collections that use -hash. While unique hashes are preferable, they are not a strict requirement.

      解决方案

      I think trying to come up with some generally useful hash function that will generate unique hash values for collections is an exercise in futility. U62's suggestion of combining the hashes of all the contents will not scale well, as it makes the hash function O(n). Hash functions should really be O(1) to ensure good performance, otherwise the purpose of the hash is defeated. (Consider the common Cocoa construct of plists, which are dictionaries containing arrays and other dictionaries, potentially ad nauseum. Attempting to take the hash of the top-level dictionary of a large plist would be excruciatingly slow if the collections' hash functions were O(n).)

      My suggestion would be not to worry a great deal about a collection's hash. As you stated, -isEqual: implies equal -hash values. On the other hand, equal -hash values do not imply -isEqual:. That fact gives you a lot of leeway to create a simple hash.

      If you're really worried about collisions though (and you have proof in concrete measurements of real-world situations that confirm it is something to be worried about), you could still follow U62's advice to some degree. For example, you could take the hash of, say, the first and/or last element in the collection, and combine that with, say, the -count of the collection. That be enough to provide a decent hash.

      I hope that answers at least one of your questions.

      As for No. 1: Implementing -isEqual: is pretty cut and dry. You enumerate the contents, and check isEqual: on each of the elements.

      There is one thing to be careful of that may affect what you decide to do for your collections' -hash functions. Clients of your collections must also understand the rules governing -isEqual: and -hash. If you use the contents' -hash in your collection's -hash, your collection will break if the contents' isEqual: and -hash don't agree. It's the client's fault, of course, but that's another argument against basing your -hash off of the collection's contents.

      No. 2 is kind of vague. Not sure what you have in mind there.

      这篇关于实现-hash / -isEqual:/ -isEqualTo ...:用于Objective-C集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆