如何返回重复的元素的数组的Ruby交集? (问题与骰子系数双字母组) [英] How to return a Ruby array intersection with duplicate elements? (problem with bigrams in Dice Coefficient)

查看:226
本文介绍了如何返回重复的元素的数组的Ruby交集? (问题与骰子系数双字母组)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想剧本骰子的系数,但我有一点与阵列相交的问题。

I'm trying to script Dice's Coefficient, but I'm having a bit of a problem with the array intersection.

def bigram(string)
  string.downcase!
  bgarray=[]
  bgstring="%"+string+"#"
  bgslength = bgstring.length
  0.upto(bgslength-2) do |i|
    bgarray << bgstring[i,2]
   end
   return bgarray
 end

def approx_string_match(teststring, refstring)
  test_bigram = bigram(teststring) #.uniq
  ref_bigram = bigram(refstring)   #.uniq

  bigram_overlay = test_bigram & ref_bigram

  result = (2*bigram_overlay.length.to_f)/(test_bigram.length.to_f+ref_bigram.length.to_f)*100

  return result
end

的问题是,如与放大器;删除重复,我得到的东西是这样的:

The problem is, as & removes duplicates, I get stuff like this:

string1="Almirante Almeida Almada"
string2="Almirante Almeida Almada"

puts approx_string_match(string1, string2) => 76.0%

它应该返回100。

It should return 100.

该uniq的方法指甲,但没有信息丢失,这可能是我工作的具体数据带来不需要的匹配。

The uniq method nails it, but there is information loss, which may bring unwanted matches in the particular dataset I'm working.

我怎样才能得到所有重复的交集包含?

How can I get an intersection with all duplicates included?

推荐答案

由于尤瓦˚F说你应该使用多集。然而,没有多集 Ruby的标准库,以在看的这里这里

As Yuval F said you should use multiset. However, there is nomultiset in Ruby standard library , Take at look at here and here.

如果性能不是您的应用程序关键,你仍然可以用做阵列带有一点点code。

If performance is not that critical for your application, you still can do it usingArray with a little bit code.

def intersect  a , b  
    a.inject([]) do |intersect, s|
      index = b.index(s)
      unless index.nil?
         intersect << s
         b.delete_at(index)
      end
      intersect        
    end
end

a=  ["al","al","lc" ,"lc","ld"]
b = ["al","al" ,"lc" ,"ef"]
puts intersect(a ,b).inspect   #["al", "al", "lc"]

这篇关于如何返回重复的元素的数组的Ruby交集? (问题与骰子系数双字母组)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆