Ruby:为什么在覆盖`#eql?` 时需要覆盖`#hash`? [英] Ruby: Why does `#hash` need to overridden whenever `#eql?` is overridden?

查看:40
本文介绍了Ruby:为什么在覆盖`#eql?` 时需要覆盖`#hash`?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本演示文稿中,演讲者创建了一个价值类.

In this presentation the speaker has created a value class.

在实现它时,他覆盖了 #eql? 并说在 Java 开发中,惯用语是每当你覆盖 #eql? 你必须覆盖 #哈希.

In implementing it, he overrides #eql? and says that in Java development, the idiom is that whenever you override #eql? you must override #hash.

class Weight
  # ...

  def hash
    pounds.hash
  end

  def eql?(other)
    self.class == other.class &&
      self.pounds == other.pounds
  end
  alias :== eql?
end

首先,什么是#hash 方法?我可以看到它返回一个整数.

Firstly, what is the #hash method? I can see it returns an integer.

> 1.hash
=> -3708808305943022538

> 2.hash
=> 1196896681607723080

> 1.hash
=> -3708808305943022538

使用 pry 我可以看到一个整数响应 #hash 但我看不到它从哪里继承该方法.它不是在 NumericObject 上定义的.如果我知道这个方法做了什么,我可能会理解为什么它需要与 #eql? 同时被覆盖.

Using pry I can see that an integer responds to #hash but I cannot see where it inherits the method from. It's not defined on Numeric or Object. If I knew what this method did, I would probably understand why it needs to be overridden at the same time as #eql?.

那么,为什么每当 eql? 被覆盖时,#hash 都需要被覆盖?

So, why does #hash need to be overridden whenever eql? is overridden?

推荐答案

首先,什么是#hash 方法?我可以看到它返回一个整数.

Firstly, what is the #hash method? I can see it returns an integer.

#hash 方法应该返回接收者的哈希值.(该方法的名称有点泄露).

The #hash method is supposed to return a hash of the receiver. (The name of the method is a bit of a giveaway).

使用 pry 我可以看到一个整数响应 #hash 但我看不到它从哪里继承了该方法.

Using pry I can see that an integer responds to #hash but I cannot see where it inherits the method from.

在 [so] 上有很多关于这个方法从哪里来"类型的问题,答案总是一样的:知道一个方法从哪里来的最好方法是简单地问它:

There are dozens of questions of the type "Where does this method come from" on [so], and the answer is always the same: the best way to know where a method comes from, is to simply ask it:

hash_method = 1.method(:hash)
hash_method.owner #=> Kernel

所以,#hash 继承自 Kernel.但是请注意,ObjectKernel 之间有一些特殊的关系,因为在 Kernel 中实现的一些方法记录在Object 反之亦然.这可能有历史原因,现在是 Ruby 社区的不幸事实.

So, #hash is inherited from Kernel. Note however, that there is a bit of a peculiar relationship between Object and Kernel, in that some methods that are implemented in Kernel are documented in Object or vice versa. This probably has historic reasons, and is now an unfortunate fact of life in the Ruby community.

不幸的是,由于我不明白的原因,Object#hash 的文档已于 2017 年在 提交具有讽刺意味的标题添加文档".但是,在 Ruby 2.4 中仍然可用(粗体强调我的):

Unfortunately, for reasons I don't understand, the documentation for Object#hash was deleted in 2017 in a commit ironically titled "Add documents". It is, however, still available in Ruby 2.4 (bold emphasis mine):

为此对象生成一个整数哈希值.这个函数必须具有a.eql?(b)隐含a.hash == b.hash的性质.

hashinteger

Generates an Integer hash value for this object. This function must have the property that a.eql?(b) implies a.hash == b.hash.

Hash 类将哈希值与 eql 一起使用? 以确定两个对象是否引用相同的哈希键.[…]

The hash value is used along with eql? by the Hash class to determine if two objects reference the same hash key. […]

所以,如您所见,#eql?#hash 之间存在着深刻而重要的关系,事实上,使用 #eql?#hash 取决于这种关系的维护.

So, as you can see, there is a deep and important relationship between #eql? and #hash, and in fact the correct behavior of methods that use #eql? and #hash depends on the fact that this relationship is maintained.

因此,我们知道该方法称为 #hash,因此可能会计算哈希值.我们知道它与 eql? 一起使用,并且我们知道它特别被 Hash 类使用.

So, we know that the method is called #hash and thus likely computes a hash. We know it is used together with eql?, and we know that it is used in particular by the Hash class.

它到底是做什么的?好吧,我们都知道哈希函数是什么:它是一个将更大的、可能无限的输入空间映射到更小的、有限的输出空间的函数.特别是,在这种情况下,输入空间是所有 Ruby 对象的空间,输出空间是快速整数"(即以前称为 Fixnum 的那些).

What does it do, exactly? Well, we all know what a hash function is: it is a function that maps a larger, potentially infinite, input space into a smaller, finite, output space. In particular, in this case, the input space is the space of all Ruby objects, and the output space is the "fast integers" (i.e. the ones that used to be called Fixnum).

而且我们知道哈希表是如何工作的:根据键的哈希值将值放置在桶中,如果我想找到一个值,那么我只需要计算键的哈希值(速度很快)并且知道我在哪个桶中找到了值(在恒定时间内),而不是例如一个键值对数组,我需要将键与数组中的每个键进行比较(线性搜索)以找到值.

And we know how a hash table works: values are placed in buckets based on the hash of their keys, if I want to find a value, then I only need to compute the hash of the key (which is fast) and know which bucket I find the value in (in constant time), as opposed to e.g. an array of key-value-pairs, where I need to compare the key against every key in the array (linear search) to find the value.

但是,有一个问题:由于一个hash的输出空间小于输入空间,所以有不同的对象具有相同的hash值,因此最终在同一个bucket中.因此,当两个对象具有不同的哈希值时,我知道它们是不同的,但如果它们具有相同的哈希值,那么它们仍然可能不同,我需要比较它们是否相等以确保 - 这就是哈希和相等之间的关系从何而来.另请注意,当同一个桶中有多个键时,我将再次需要将搜索键与桶中的每个键进行比较(线性搜索)以找到值.

However, there is a problem: Since the output space of a hash is smaller than the input space, there are different objects which have the same hash value and thus end up in the same bucket. Thus, when two objects have different hash values, I know for a fact that they are different, but if they have the same hash value, then they could still be different, and I need to compare them for equality to be sure – and that's where the relationship between hash and equality comes from. Also note that when many keys and up in the same bucket, I will again have to compare the search key against every key in the bucket (linear search) to find the value.

从所有这些我们可以得出 #hash 方法的以下属性:

From all this we can conclude the following properties of the #hash method:

  • 它必须返回一个Integer.
  • 不仅如此,它还必须返回一个快速整数"(相当于旧的Fixnums).
  • 对于被视为相等的两个对象,它必须返回相同的整数.
  • 可能为两个被认为不相等的对象返回相同的整数.
  • 然而,它只应该以低概率这样做.(否则,Hash 可能退化为性能严重下降的链表.)
  • 故意构造不相等但具有相同哈希值的对象也应该困难.(否则,攻击者可以强制一个Hash退化为一个链表,作为一种服务降级攻击的形式.)
  • It must return an Integer.
  • Not only that, it must return a "fast integer" (equivalent to the old Fixnums).
  • It must return the same integer for two objects that are considered equal.
  • It may return the same integer for two objects that are considered unequal.
  • However, it only should do so with low probability. (Otherwise, a Hash may degenerate into a linked list with highly degraded performance.)
  • It also should be hard to construct objects that are unequal but have the same hash value deliberately. (Otherwise, an attacker can force a Hash to degenerate into a linked list as a form of Degradation-of-Service attack.)

这篇关于Ruby:为什么在覆盖`#eql?` 时需要覆盖`#hash`?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆