实现 -hash/-isEqual:/-isEqualTo...: 用于 Objective-C 集合 [英] Implementing -hash / -isEqual: / -isEqualTo...: for Objective-C collections

查看:27
本文介绍了实现 -hash/-isEqual:/-isEqualTo...: 用于 Objective-C 集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:以下 SO 问题是相关的,但它们和链接的资源似乎都没有完全回答我的问题,特别是在实施对象集合的相等性测试>.

Note: The following SO questions are related, but neither they nor the linked resources seem to fully answer my questions, particularly in relation to implementing equality tests for collections of objects.

NSObject 提供-hash(返回实例的地址,如(NSUInteger)self) 和 -isEqual:(除非接收者的地址和参数的地址相同,否则返回NO).这些方法被设计为在必要时被覆盖,但文档清楚地表明您应该同时提供或都不提供.此外,如果 -isEqual: 对两个对象返回 YES,那么这些对象的 -hash 结果必须是相同的.如果不是,则在将应该相同的对象(例如 -compare: 返回 NSOrderedSame 的两个字符串实例)添加到 Cocoa 集合或直接进行比较时,可能会出现问题.

NSObject provides default implementations of -hash (which returns the address of the instance, like (NSUInteger)self) and -isEqual: (which returns NO unless the addresses of the receiver and the parameter are identical). These methods are designed to be overridden as necessary, but the documentation makes it clear that you should provide both or neither. Further, if -isEqual: returns YES for two objects, then the result of -hash for those objects must be the same. If not, problems can ensue when objects that should be the same — such as two string instances for which -compare: returns NSOrderedSame — are added to a Cocoa collection or compared directly.

我开发了 CHDataStructures.framework,这是一个开源的 Objective-C 数据结构库.我已经实现了许多集合,目前正在改进和增强它们的功能.我想添加的功能之一是能够比较集合与另一个集合的相等性.

I develop CHDataStructures.framework, an open-source library of Objective-C data structures. I have implemented a number of collections, and am currently refining and enhancing their functionality. One of the features I want to add is the ability to compare collections for equality with another.

这些比较应该考虑两个集合中存在的对象(包括排序,如果适用),而不是只比较内存地址.这种方式在Cocoa中颇有先例,一般采用单独的方式,包括以下几种:

Rather than comparing only memory addresses, these comparisons should consider the objects present in the two collections (including ordering, if applicable). This approach has quite a precedent in Cocoa, and generally uses a separate method, including the following:

我想让我的自定义集合对相等性测试具有健壮性,因此它们可以安全地(并且可预测地)添加到其他集合中,并允许其他集合(如 NSSet)确定两个集合是否相等/等价/重复.

I want to make my custom collections robust to tests of equality, so they may safely (and predictably) be added to other collections, and allow others (like an NSSet) to determine whether two collections are equal/equivalent/duplicates.

-isEqualTo...: 方法本身就很好用,但是定义这些方法的类通常也会覆盖 -isEqual: 来调用 [selfisEqualTo...:] 如果参数与接收者属于同一类(或子类),否则 [super isEqual:] .这意味着该类还必须定义 -hash,以便它为具有相同内容的不同实例返回相同的值.

An -isEqualTo...: method works great on its own, but classes which define these methods usually also override -isEqual: to invoke [self isEqualTo...:] if the parameter is of the same class (or perhaps subclass) as the receiver, or [super isEqual:] otherwise. This means the class must also define -hash such that it will return the same value for disparate instances that have the same contents.

此外,Apple 的 -hash 文档规定如下:(强调我的)

In addition, Apple's documentation for -hash stipulates the following: (emphasis mine)

"如果将可变对象添加到使用散列值来确定对象在集合中的位置的集合中,则对象在集合中时,对象的散列方法返回的值不得更改.因此,要么哈希方法不能依赖于任何对象的内部状态信息或者你必须确保当对象处于收藏.因此,例如,一个可变字典可以放在一个哈希表中,但是当它在那里时你不能改变它.(请注意,可能很难知道给定的对象是否在集合中.)"

"If a mutable object is added to a collection that uses hash values to determine the object's position in the collection, the value returned by the hash method of the object must not change while the object is in the collection. Therefore, either the hash method must not rely on any of the object's internal state information or you must make sure the object's internal state information does not change while the object is in the collection. Thus, for example, a mutable dictionary can be put in a hash table but you must not change it while it is in there. (Note that it can be difficult to know whether or not a given object is in a collection.)"

我完全理解为什么这是必要的并且完全同意推理 - 我在这里提到它是为了提供额外的背景,并回避了为什么会这样的话题为简洁起见.

我所有的集合都是可变的,散列必须至少考虑一些的内容,所以这里唯一的选择是将存储在另一个集合中的集合变异视为编程错误收藏.(我的合集都采用NSCopying,所以像NSDictionary这样的合集可以成功制作副本以用作密钥等)

All of my collections are mutable, and the hash will have to consider at least some of the contents, so the only option here is to consider it a programming error to mutate a collection stored in another collection. (My collections all adopt NSCopying, so collections like NSDictionary can successfully make a copy to use as a key, etc.)

实现 -isEqual:-hash 对我来说是有意义的,因为(例如)我的一个类的间接用户可能不知道特定的 -isEqualTo...: 方法调用,甚至关心两个对象是否是同一个类的实例.他们应该能够对 id 类型的任何变量调用 -isEqual:-hash 并获得预期的结果.

It makes sense for me to implement -isEqual: and -hash, since (for example) an indirect user of one of my classes may not know the specific -isEqualTo...: method to call, or even care whether two objects are instances of the same class. They should be able to call -isEqual: or -hash on any variable of type id and get the expected result.

-isEqual:(可以访问被比较的两个实例)不同,-hash 必须盲目地"返回结果,只能访问其中的数据一个特定的实例.由于它不知道散列的用途,结果必须对所有可能被视为相等/相同的实例保持一致,并且必须始终与一致-isEqual:.(这已经被下面的答案揭穿了,它确实让生活更轻松.)此外,编写好的散列函数并非易事——保证唯一性是一个挑战,尤其是当你只有一个用于表示它的 NSUInteger(32/64 位).

Unlike -isEqual: (which has access to two instances being compared), -hash must return a result "blindly", with access only to the data within a particular instance. Since it can't know what the hash is being used for, the result must be consistent for all possible instances that should be considered equal/identical, and must always agree with -isEqual:. ( This has been debunked by the answers below, and it certainly makes life easier.) Further, writing good hash functions is non-trivial — guaranteeing uniqueness is a challenge, especially when you only have an NSUInteger (32/64 bits) in which to represent it.

  1. 在为集合实施 相等比较 -hash 时是否有最佳实践?
  2. 在 Objective-C 和 Cocoa 风格的集合中是否有任何特殊之处需要规划?
  3. 对于-hash 单元测试,有没有什么好的方法可以以合理的置信度进行单元测试?
  4. 对于包含任意类型元素的集合,关于实现 -hash 以同意 -isEqual: 的任何建议?我应该知道哪些陷阱?(不像我最初想的那样有问题——正如@kperryua指出的那样,相等的-hash不会 表示 -isEqual:".)
  1. Are there best practices when implementing equality comparisons -hash for collections?
  2. Are there any peculiarities to plan for in Objective-C and Cocoa-esque collections?
  3. Are there any good approaches for unit testing -hash with a reasonable degree of confidence?
  4. Any suggestions on implementing -hash to agree with -isEqual: for collections containing elements of arbitrary types? What pitfalls should I know about? ( Not as problematic as I first thought — as @kperryua points out, "equal -hash values do not imply -isEqual:".)


我应该澄清一下,我对如何实现 -isEqual: 或 -isEqualTo...: 对于集合并不感到困惑,这很简单.我认为我的困惑主要源于(错误地)认为 -hash 必须返回不同的值,如果 -isEqual: 返回 NO.过去做过密码学,我认为不同值的散列必须不同.然而,下面的答案让我意识到一个好"哈希函数实际上是关于最小化桶冲突和链接使用-hash的集合.虽然最好使用唯一的哈希值,但它们并不是严格的要求.


I should have clarified that I'm not confused about how to implement -isEqual: or -isEqualTo...: for collections, that's straightforward. I think my confusion stemmed mainly from (mistakenly) thinking that -hash MUST return a different value if -isEqual: returns NO. Having done cryptography in the past, I was thinking that hashes for different values MUST be different. However, the answers below made me realize that a "good" hash function is really about minimizing bucket collisions and chaining for collections that use -hash. While unique hashes are preferable, they are not a strict requirement.

推荐答案

我认为尝试提出一些普遍有用的散列函数来为集合生成唯一的散列值是徒劳的.U62 建议将所有内容的散列组合起来,并不能很好地扩展,因为它使散列函数为 O(n).散列函数确实应该是 O(1) 以确保良好的性能,否则散列的目的就落空了.(考虑 plist 的常见 Cocoa 构造,它们是包含数组和其他字典的字典,可能令人恶心.如果集合的散列函数是 O(n).)

I think trying to come up with some generally useful hash function that will generate unique hash values for collections is an exercise in futility. U62's suggestion of combining the hashes of all the contents will not scale well, as it makes the hash function O(n). Hash functions should really be O(1) to ensure good performance, otherwise the purpose of the hash is defeated. (Consider the common Cocoa construct of plists, which are dictionaries containing arrays and other dictionaries, potentially ad nauseum. Attempting to take the hash of the top-level dictionary of a large plist would be excruciatingly slow if the collections' hash functions were O(n).)

我的建议是不要太担心集合的哈希值.正如您所说, -isEqual: 意味着相等的 -hash 值.另一方面,相等的 -hash意味着 -isEqual:.这一事实为您提供了很多创建简单哈希的余地.

My suggestion would be not to worry a great deal about a collection's hash. As you stated, -isEqual: implies equal -hash values. On the other hand, equal -hash values do not imply -isEqual:. That fact gives you a lot of leeway to create a simple hash.

如果你真的很担心碰撞(并且你有现实世界情况的具体测量证据,证实它是需要担心的),你仍然可以按照 U62 的建议来一定程度上.例如,您可以获取集合中第一个和/或最后一个元素的哈希值,并将其与集合的 -count 组合.这足以提供一个不错的散列.

If you're really worried about collisions though (and you have proof in concrete measurements of real-world situations that confirm it is something to be worried about), you could still follow U62's advice to some degree. For example, you could take the hash of, say, the first and/or last element in the collection, and combine that with, say, the -count of the collection. That be enough to provide a decent hash.

我希望至少能回答您的一个问题.

I hope that answers at least one of your questions.

至于第 1 点:实施 -isEqual: 非常简单和枯燥.您枚举内容,并在每个元素上检查 isEqual: .

As for No. 1: Implementing -isEqual: is pretty cut and dry. You enumerate the contents, and check isEqual: on each of the elements.

需要注意的一件事可能会影响您决定为集合的 -hash 函数执行的操作.您集合的客户端还必须了解控制 -isEqual:-hash 的规则.如果您在集合的 -hash 中使用内容的 -hash,如果内容的 isEqual:-hash 不同意.当然,这是客户的错,但这是反对将 -hash 建立在集合内容之外的另一个论据.

There is one thing to be careful of that may affect what you decide to do for your collections' -hash functions. Clients of your collections must also understand the rules governing -isEqual: and -hash. If you use the contents' -hash in your collection's -hash, your collection will break if the contents' isEqual: and -hash don't agree. It's the client's fault, of course, but that's another argument against basing your -hash off of the collection's contents.

没有.2有点含糊.不知道你在想什么.

No. 2 is kind of vague. Not sure what you have in mind there.

这篇关于实现 -hash/-isEqual:/-isEqualTo...: 用于 Objective-C 集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆