为什么这两种方法产生不同的结果? [英] Why do these two methods yield different results?
问题描述
根据所有文档,您可以使用<<
或.push
或+=
将元素追加到数组,结果应相同.我发现不是.有人可以向我解释我出了什么问题吗? (我使用的是Ruby 2.3.1.)
According to all documentation, you can append an element to an array using <<
or .push
or +=
, and the result ought to be the same. I have found it isn't. Can anybody explain to me what I am getting wrong? (I am using Ruby 2.3.1.)
我有很多哈希.它们都包含相同的密钥.我想将它们组合在一起,以一个数组的形式将所有收集的值组成一个哈希.这很简单,您遍历所有哈希并创建一个新哈希,收集所有值,如下所示:
I have got a number of hashes. All of them contain the same keys. I would like to combine them to form one hash with all the collected values in an array. This is straightforward, you iterate through all the hashes and make a new one, collecting all the values like this:
# arg is array of Hashes - keys must be identical
return {} unless arg
keys = (arg[0] ? arg[0].keys : [])
result = keys.product([[]]).to_h # value for each key is empty array.
arg.each do |h|
h.each { |k,v| result[k] += [v] }
end
result
end
如果我使用.push
或<<
代替而不使用+=
,则会得到完全奇怪的结果.
If instead of using +=
I use .push
or <<
, I get completely weird results.
使用以下测试数组:
a_of_h = [{"1"=>10, "2"=>10, "3"=>10, "4"=>10, "5"=>10, "6"=>10, "7"=>10, "8"=>10, "9"=>10, "10"=>10}, {"1"=>100, "2"=>100, "3"=>100, "4"=>100, "5"=>100, "6"=>100, "7"=>100, "8"=>100, "9"=>100, "10"=>100}, {"1"=>1000, "2"=>1000, "3"=>1000, "4"=>1000, "5"=>1000, "6"=>1000, "7"=>1000, "8"=>1000, "9"=>1000, "10"=>1000}, {"1"=>10000, "2"=>10000, "3"=>10000, "4"=>10000, "5"=>10000, "6"=>10000, "7"=>10000, "8"=>10000, "9"=>10000, "10"=>10000}]
我知道
merge_hashes(a_of_h)
=> {"1"=>[10, 100, 1000, 10000], "2"=>[10, 100, 1000, 10000], "3"=>[10, 100, 1000, 10000], "4"=>[10, 100, 1000, 10000], "5"=>[10, 100, 1000, 10000], "6"=>[10, 100, 1000, 10000], "7"=>[10, 100, 1000, 10000], "8"=>[10, 100, 1000, 10000], "9"=>[10, 100, 1000, 10000], "10"=>[10, 100, 1000, 10000]}
如我所料,但是如果我使用h.each { |k,v| result[k] << v }
,我会得到
as I expect, but if I use h.each { |k,v| result[k] << v }
instead I get
buggy_merge_hashes(a_of_h)
=> {"1"=>[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000], "2"=>[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000], "3"=>[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000], "4"=>[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000], "5"=>[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000], ...}
(剩下的我都剪掉了.)
(I cut the rest.)
我在这里不知道是什么?
What is it I don't know here?
推荐答案
<<
和#push
是破坏性操作(它们会更改接收方).
<<
and #push
are destructive operations (they change the receiver).
+
(因此也是+=
)是一种非破坏性操作(它返回一个新对象,而接收方保持不变).
+
(and consequently +=
as well) is a non-destructive operation (it returns a new object, leaving the receiver unchanged).
虽然他们似乎在做同一件事,但这种看似很小的差异至关重要.
While they seem to be doing the same thing, this apparently small difference is crucial.
这是由于另一个错误导致的:您在result
中的所有子数组都以同一对象开始.如果您将它们添加到其中之一,则将它们添加到所有它们中.
This comes into play due to another error: all of your subarrays in result
start off as the same object. If you add to one of them, you add to all of them.
如果使用+=
,为什么这不是问题?因为result[k] += [v]
与result[k] = result[k] += [v]
相同(我躺在这里,有细微的差别,但是在这里并不重要,只是接受它们现在是相同的,不要再感到困惑了:D);并且由于+
是非破坏性的,因此result[k] + [v]
是与result[k]
不同的对象;当您使用此赋值更新数组中的值时,就不再使用起始的[]
对象,并且引用共享错误也不能再咬您了.
Why is this not an issue if you use +=
? Because result[k] += [v]
is the same as result[k] = result[k] += [v]
(I'm lying here, there's a subtle difference, but it is not relevant here and just accept that they're the same for now to not get more confused :D ); and as +
is non-destructive, result[k] + [v]
is a different object than result[k]
; when you update the value in the array with this assignment, you are not using the starting []
object any more, and the reference sharing error can't bite you any more.
创建result
数组的更好方法是以下方法之一:
A better way to create your result
array would be one of these:
result = Array.new(keys.size) { [] }
result = keys.map { [] }
这将为每个元素创建一个新的数组对象.
which will create a new array object for each element.
但是,我会写得完全不同:
However, I would write it all quite differently:
a_of_h.each_with_object(Hash.new { |h, k| h[k] = [] }) { |h, r|
h.each { |k, v| r[k] << v }
}
each_with_hash
将传递的对象作为附加参数提供给块(此处为r
,表示结果),并在方法完成后返回它.参数-将位于r
中的对象-将是带有default_proc
的哈希:每次我们尝试获取尚不在内部的键时,它将在其中插入一个新数组(即,而不是尝试使用pre -填充我们的结果对象,按需进行).然后,我们只需遍历数组中的每个哈希,然后将值插入结果哈希即可,而不必担心键是否存在.
each_with_hash
will give the passed object to the block as an additional argument (here r
, for result), and will return it when the method is done. The argument — the object that will be in r
— will be a hash with a default_proc
: every time we try to get a key that's not inside yet, it will insert a new array there (i.e. instead of trying to pre-populate our result object, do it on-demand). Then we just go through each of the hashes in your array, and insert the value into the result hash without worrying if the key is there or not.
这篇关于为什么这两种方法产生不同的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!