Ruby 比较两个字符串的相似度百分比 [英] Ruby compare two strings similarity percentage

查看:65
本文介绍了Ruby 比较两个字符串的相似度百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我喜欢在 Ruby 中比较两个字符串并找出它们的相似性

Id like to compare two strings in Ruby and find their similarity

我已经看过 Levenshtein gem,但它似乎是在 2008 年更新的,我找不到如何使用它的文档.一些博客暗示它坏了

I've had a look at the Levenshtein gem but it seems this was last updated in 2008 and I can't find documentation how to use it. With some blogs suggesting its broken

我用 Levenshtein 尝试了 text gem,但它给出了一个整数(越小越好)

I tried the text gem with Levenshtein but it gives an integer (smaller is better)

显然,如果两个字符串的长度可变,我会遇到 Levenshtein 算法的问题(比如比较两个名字,一个有中间名,一个没有).

Obviously if the two strings are of variable length I run into problems with the Levenshtein Algorithm (Say comparing two names, where one has a middle name and one doesnt).

你建议我怎么做才能获得百分比比较?

What would you suggest I do to get a percentage comparison?

我正在寻找类似于 PHP 的 类似文本

推荐答案

我认为您的问题可以做一些澄清,但这里有一些快速而肮脏的事情(根据您上面的澄清计算较长字符串的百分比):

I think your question could do with some clarifications, but here's something quick and dirty (calculating as percentage of the longer string as per your clarification above):

def string_difference_percent(a, b)
  longer = [a.size, b.size].max
  same = a.each_char.zip(b.each_char).select { |a,b| a == b }.size
  (longer - same) / a.size.to_f
end

我仍然不确定您正在寻找的百分比差异有多大意义,但这至少应该让您开始.

I'm still not sure how much sense this percent difference you are looking for makes, but this should get you started at least.

它有点像 Levensthein 距离,因为它逐个字符地比较字符串.因此,如果两个名字仅在中间名上有所不同,它们实际上会大不相同.

It's a bit like Levensthein distance, in that it compares the strings character by character. So if two names differ only by the middle name, they'll actually be very different.

这篇关于Ruby 比较两个字符串的相似度百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆