最有效的方法来计算两个阵列之间的重复元素 [英] Most efficient way to count duplicated elements between two arrays

查看:119
本文介绍了最有效的方法来计算两个阵列之间的重复元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为一个非常基本的程序,我用Ruby编写的一部分,我想找到长度相等的两个数组之间共享内容的总数,但
我需要包括重复。

As part of a very basic program I am writing in Ruby, I am trying to find the total number of shared elements between two arrays of equal length, but I need to include repeats.

我现在的例子code这种情况如下:

My current example code for this situation is as follows:

array_a = ["B","A","A","A","B"]
array_b = ["A","B","A","B","B"]
counter = 0
array_a.each_index do |i|
  array_a.sort[i] == array_b.sort[i]
    counter += 1
  end
end
puts counter

我想这种比较在这种情况下的返回值是4,而不是2,因为这两个阵列分享2重复字符(A两次,B两次)。这似乎工作,但如果有此问题的更有效的解决方案,我想知道。具体是否有你会建议寻找到任何方法。我与别人谁提出了一个不同的方法说话,,但我真的不知道如何适用,并想明白。我做了相当多的阅读它的用途,它仍然是我并不清楚是怎么回事为宜。谢谢你。

I want the return value of this comparison in this instance to be 4, and not 2, as the two arrays share 2 duplicate characters ("A" twice, and "B" twice). This seems to work, but I am wondering if there are any more efficient solutions for this issue. Specifically whether there are any methods you would suggest looking into. I spoke with someone who suggested a different method, inject, but I really don't understand how that applies and would like to understand. I did quite a bit of reading on uses for it, and it still isn't clear to me how it is appropriate. Thank you.

在我的code看,我已经意识到,它似乎并没有为我所描述的情况工作。

Looking at my code, I have realized that it doesn't seem to work for the situation that I am describing.

推荐答案

如果我理解正确的问题,你可以做到以下几点。

If I understand the question correctly, you could do the following.

code

def count_shared(arr1, arr2)
  arr1.group_by(&:itself).
       merge(arr2.group_by(&:itself)) { |_,ov,nv| [ov.size, nv.size].min }.
       values.
       reduce(0) { |t,o| (o.is_a? Array) ? t : t + o }
end

例子

arr1 = ["B","A","A","A","B"]
arr2 = ["A","B","A","B","B"]

count_shared(arr1, arr2)
  #=> 4 (2 A's + 2 B's)

arr1 = ["B", "A", "C", "C", "A", "A", "B", "D", "E", "A"]
arr2 = ["C", "D", "F", "F", "A", "B", "A", "B", "B", "G"]

count_shared(arr1, arr2)
  #=> 6 (2 A's + 2 B's + 1 C + 1 D + 0 E's + 0 F's + 0 G's)

说明

的步骤是用于第一实施例的一个略加修改如下。

The steps are as follows for a slightly modified version of the first example.

arr1 = ["B","A","A","A","B","C","C"]
arr2 = ["A","B","A","B","B","D"]

首先用可枚举#GROUP_BY 以无论 ARR1 ARR2

h0 = arr1.group_by(&:itself)
  #=> {"B"=>["B", "B"], "A"=>["A", "A", "A"], "C"=>["C", "C"]} 
h1 = arr2.group_by(&:itself)
  #=> {"A"=>["A", "A"], "B"=>["B", "B", "B"], "D"=>["D"]} 

此前红宝石V.2.2,当对象#本身引入,你会写:

arr.group_by { |e| e }

继续,

h2 = h0.merge(h1) { |_,ov,nv| [ov.size, nv.size].min }
  #=> {"B"=>2, "A"=>2, "C"=>["C", "C"], "D"=>["D"]} 

我会很快返回来解释上面的计算。

I will return shortly to explain the above calculation.

a = h2.values
  #=> [2, 2, ["C", "C"], ["D"]] 
a.reduce(0) { |t,o| (o.is_a? Array) ? t : t + o }
  #=> 4

下面可枚举#减少(又名)只是总结的值是不是数组。该阵列对应于没有出现在 ARR2 ARR1 元素或反之亦然的。

Here Enumerable#reduce (aka inject) merely sums the values of a that are not arrays. The arrays correspond to elements of arr1 that do not appear in arr2 or vise-versa.

作为承诺,我现在解释如何 H2 计算。我使用的哈希#的形式合并,它采用一块(这里 {| K,OV,NV | [ov.size,nv.size] .min} )来计算是$键的值在这两个散列p $ psent被合并。例如,当第一个键值对 H1的A=>A,A] )被合并到 H0 ,因为 H0 也有一个键 A,数组

As promised, I will now explain how h2 is computed. I've used the form of Hash#merge that employs a block (here { |k,ov,nv| [ov.size, nv.size].min }) to compute the values of keys that are present in both hashes being merged. For example, when the first key-value pair of h1 ("A"=>["A", "A"]) is being merged into h0, since h0 also has a key "A", the array

["A", ["A", "A", "A"], ["A", "A"]]

被传递到块和三个块变量被分配值(使用平行分配,这有时被称为多任务):

is passed to the block and the three block variables are assigned values (using "parallel assignment", which is sometimes called "multiple assignment"):

k, ov, nv = ["A", ["A", "A", "A"], ["A", "A"]]

所以我们有

k  #=> "A" 
ov #=> ["A", "A", "A"] 
nv #=> ["A", "A"] 

K 是关键, OV (旧值)为A在 H0 NV (新价值)是值在 H1 A。块计算

k is the key, ov ("old value") is the value of "A" in h0 and nv ("new value") is the value of "A" in h1. The block calculation is

[ov.size, nv.size].min
  #=> [3,2].min = 2

这样的值A现在 2

注意的关键, K ,是不是块计算中使用(使用这种形式的的合并<时,这是很常见的/ code>)。出于这个原因我已经改变了块变量从 K _ (合法的局部变量),既减少引入一个错误的并有机会信号给该键没有在块中使用的读者。使用此块 H2 的其他元素也同样计算。

Notice that the key, k, is not used in the block calculation (which is very common when using this form of merge). For that reason I've changed the block variable from k to _ (a legitimate local variable), both to reduce the chance of introducing a bug and to signal to the reader that the key is not used in the block. The other elements of h2 that use this block are computed similarly.

另一种方式

这将是,如果我们有一个可用阵列方法相当简单的我已经提议被添加到Ruby内核

It would be quite simple if we had available an Array method I've proposed be added to the Ruby core:

array_a = ["B","A","A","A","B"]
array_b = ["A","B","A","B","B"]

array_a.size - (array_a.difference(array_b)).size
  #=> 4

array_a.size - (array_b.difference(array_a)).size
  #=> 4

我在这里引用我的回答其他应用程序。

I've cited other applications in my answer here.

这篇关于最有效的方法来计算两个阵列之间的重复元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆