Ruby 1.8.6 Array#uniq不删除重复的哈希 [英] Ruby 1.8.6 Array#uniq not removing duplicate hashes

查看:130
本文介绍了Ruby 1.8.6 Array#uniq不删除重复的哈希的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在ruby 1.8.6控制台中有这个数组:

  arr = [{:foo =>  bar},{:foo =>  bar}] 

两个元素彼此相等:

  arr [0] == arr [1] 
=> true
#以防万一有些 == vs ===奇怪...
arr [0] === arr [1]
=> true

但是,arr.uniq不会删除重复项:

  arr.uniq 
=> [{:foo => bar},{:foo => bar}]

有人可以告诉我这是怎么回事吗?



编辑:我可以编写一个不太聪明的uniqifier,它使用 include?如下:

  uniqed = [] 
arr.each | hash |
除非uniqed.include?(哈希),否则
uniqed<< hash
end
end; false
uniqed
=> [{:foo => bar}]

这会产生正确的结果, uniq 的失败更加神秘。



编辑2:关于发生的事情的一些注释,可能只是为了我自己的清楚。正如@ Ajedi32在评论中指出的那样,无法统一是因为两个元素是不同的对象。一些类定义 eql? hash 方法,用于比较,意思是这些实际上是同一件事,甚至如果它们不是内存中的同一对象。例如,String就是这样做的,这就是为什么您可以将两个变量定义为 foo,并且即使它们不是同一对象,也可以说它们彼此相等。



在Ruby 1.8.6中,Hash类不会执行此操作,所以当 .eql?和<$ c $ h> .hash 在哈希对象上调用(.hash方法与Hash数据类型无关-就像哈希的校验和一样),它回退到使用Object基类,只说它是内存中的相同对象。



== === 运算符,对于哈希对象,已经可以满足我的要求,也就是说,如果两个哈希值的内容相同,则它们是相同的。我已经覆盖了 Hash#eql?来使用它们,就像这样:

  class Hash 
def eql?(other_hash)
self == other_hash
end
end

但是,我不确定如何处理 Hash#hash :也就是说,我不知道如何生成校验和对于内容相同的两个散列将是相同的,并且对于内容不同的两个散列将始终是不同的。



@ Ajedi32建议我看一下Rubinius的实现 Hash#hash 方法的此处 https://github.com/rubinius/rubinius/blob/master/core/hash.rb#L589 ,而我的Rubinius实现版本如下所示:

  class Hash 
def hash
结果= self.size
self.each做| key,value |
结果^ = key.hash
结果^ = value.hash
结束
返回结果
结束
结束

这似乎确实有效,尽管我不知道 ^ =运算符的作用,这让我有点紧张。而且,它非常慢-根据某些原始基准测试,速度大约是原来的50倍。



编辑3:一些研究表明, ^是按位异或运算符。当我们有两个输入时,如果输入不同,那么XOR会返回1(即,对于0,0和1,1,它返回0;对于0,1和1,0,它返回1)。



所以,起初我认为这意味着

 结果^ = key.hash 



<$的简写p $ p> result =结果^ key.hash

在result的当前值和另一值之间进行XOR,然后将其保存在result中。不过,我仍然不太了解这个逻辑。我认为也许^运算符与指针有关,因为在变量上调用它同时在变量值上调用是不起作用的:例如

  var = 1 
=> 1
var ^ =:foo
=> 14904
1 ^ =:foo
语法错误:编译错误
(irb):11:语法错误,意外的tOP_ASGN,期望$ end

因此,可以在变量上调用^ =,但不能在变量的值上调用,这很好,这使我认为这与引用/解引用有关。



后来的Ruby实现也为Hash#hash方法提供了C代码,而Rubinius的实现似乎太慢了。有点卡住...

解决方案

出于效率原因, Array#uniq 不会使用 == 甚至 === 比较值。根据文档


它使用哈希和eql比较值吗?效率的方法。


(请注意,我在此处链接了2.4.2的文档。而1.8.6的文档不包含此内容声明,我认为该版本的Ruby仍然适用。)



在Ruby 1.8.6中,既不是 Hash#hash 也不是 Hash#eql?已实现,因此它们回退到使用 Object#hash Object#eql?


相等性-在对象级别,仅当obj和other是同一对象时, == 返回true。通常,此方法在子孙类中被重写以提供特定于类的含义。



[...]



如果obj和anObject的值相同,则 eql?方法返回 true 。 Hash用于测试成员的相等性。对于Object类的对象, eql? == 的同义词。


因此,根据 Array#uniq ,这两个哈希是不同的对象,因此是唯一的。



要解决此问题,您可以尝试定义 Hash#hash Hash#eql? 自己。如何做的细节留给读者练习。您可能会发现参考 Rubinius对这些方法的实现


I have this array, in a ruby 1.8.6 console:

arr = [{:foo => "bar"}, {:foo => "bar"}]

both elements are equal to each other:

arr[0] == arr[1]
=> true
#just in case there's some "==" vs "===" oddness...
arr[0] === arr[1]
=> true 

But, arr.uniq doesn't remove the duplicates:

arr.uniq
=> [{:foo=>"bar"}, {:foo=>"bar"}]

Can anyone tell me what's going on here?

EDIT: I can write a not very clever uniqifier which uses include? as follows:

uniqed = []
arr.each do |hash|
  unless uniqed.include?(hash)
    uniqed << hash
  end
end;false
uniqed
=> [{:foo=>"bar"}]

This produces the correct result, which makes the failure of uniq even more mysterious.

EDIT 2: Some notes on what's going on, possibly just for my own clarity. As @Ajedi32 points out in the comments, the failure to uniqify comes from the fact that the two elements are different objects. Some classes define eql? and hash methods, used for comparison, to mean "are these effectively the same thing, even if they're not the same object in memory". String does this for example, which is why you can define two variables to be "foo" and they are said to be equal to one another, even though they're not the same object.

The Hash class doesn't do this, in Ruby 1.8.6, and so when .eql? and .hash are called on a hash object (the .hash method has nothing to do with the Hash data type - it's like the checksum kind of hash) it falls back to using the methods defined in the Object base class, which simply say "Is it the same object in memory".

The == and === operators, for hash objects, already do what I want, ie to say that two hashes are the same if their contents are the same. I've overriden Hash#eql? to use these, like so:

class Hash
  def eql?(other_hash)
    self == other_hash
  end
end

But, I'm not sure how to handle Hash#hash: that is, I don't know how to generate a checksum which will be the same for two hashes whose contents are the same and always different for two hashes with different contents.

@Ajedi32 suggested I have a look at Rubinius' implentation of the Hash#hash method here https://github.com/rubinius/rubinius/blob/master/core/hash.rb#L589 , and my version of Rubinius' implementation looks like this:

class Hash
  def hash
    result = self.size
    self.each do |key,value|
      result ^= key.hash 
      result ^= value.hash 
    end
    return result
  end
end

and this does seem to work, although I don't know what the "^=" operator does, which makes me a bit nervous. Also, it's very slow - about 50x as slow based on some primitive benchmarking. This might make it too slow to use.

EDIT 3: A bit of research has revealed that "^" is the Bitwise Exclusive OR operator. When we have two inputs, an XOR returns 1 if the inputs are different (ie it returns 0 for 0,0 and 1,1 and 1 for 0,1 and 1,0).

So, at first I thought that means that

result ^= key.hash 

is shorthand for

result = result ^ key.hash

In other words, do an XOR between the current value of result and the other thing, and then save that in result. I still don't quite get the logic of this though. I thought that perhaps the ^ operator was something to do with pointers, because calling it on variables works while calling it on the value of the variable doesn't work: eg

var = 1
=> 1
var ^= :foo
=> 14904
1 ^= :foo
SyntaxError: compile error
(irb):11: syntax error, unexpected tOP_ASGN, expecting $end

So, it's fine with calling ^= on a variable but not the value of the variable, which made me think it's something to do with referencing/dereferencing.

Later implementations of Ruby also have C code for the Hash#hash method, and Rubinius' implementaion seems too slow. Bit stuck...

解决方案

For efficiency reasons, Array#uniq does not compare values using == or even ===. According to the docs:

It compares values using their hash and eql? methods for efficiency.

(Note I linked the docs for 2.4.2 here. While the docs for 1.8.6 do not include this statement, I believe it still holds true for that version of Ruby.)

In Ruby 1.8.6, neither Hash#hash nor Hash#eql? are implemented, so they fallback to using Object#hash and Object#eql?:

Equality—At the Object level, == returns true only if obj and other are the same object. Typically, this method is overridden in descendent classes to provide class-specific meaning.

[...]

The eql? method returns true if obj and anObject have the same value. Used by Hash to test members for equality. For objects of class Object, eql? is synonymous with ==.

So according to Array#uniq, those two hashes are different objects, and are therefore unique.

To fix this, you can try defining Hash#hash and Hash#eql? yourself. The details of how to do this are left as an exercise to the reader. You may find it helpful however to refer to Rubinius's implementation of these methods.

这篇关于Ruby 1.8.6 Array#uniq不删除重复的哈希的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆