Ruby 比较两个字符串的相似度百分比 [英] Ruby compare two strings similarity percentage
问题描述
我喜欢在 Ruby 中比较两个字符串并找出它们的相似性
Id like to compare two strings in Ruby and find their similarity
我已经看过 Levenshtein
gem,但它似乎是在 2008 年更新的,我找不到如何使用它的文档.一些博客暗示它坏了
I've had a look at the Levenshtein
gem but it seems this was last updated in 2008 and I can't find documentation how to use it. With some blogs suggesting its broken
我用 Levenshtein 尝试了 text
gem,但它给出了一个整数(越小越好)
I tried the text
gem with Levenshtein but it gives an integer (smaller is better)
显然,如果两个字符串的长度可变,我会遇到 Levenshtein 算法的问题(比如比较两个名字,一个有中间名,一个没有).
Obviously if the two strings are of variable length I run into problems with the Levenshtein Algorithm (Say comparing two names, where one has a middle name and one doesnt).
你建议我怎么做才能获得百分比比较?
What would you suggest I do to get a percentage comparison?
我正在寻找类似于 PHP 的 类似文本一>
推荐答案
我认为您的问题可以做一些澄清,但这里有一些快速而肮脏的事情(根据您上面的澄清计算较长字符串的百分比):>
I think your question could do with some clarifications, but here's something quick and dirty (calculating as percentage of the longer string as per your clarification above):
def string_difference_percent(a, b)
longer = [a.size, b.size].max
same = a.each_char.zip(b.each_char).select { |a,b| a == b }.size
(longer - same) / a.size.to_f
end
我仍然不确定您正在寻找的百分比差异有多大意义,但这至少应该让您开始.
I'm still not sure how much sense this percent difference you are looking for makes, but this should get you started at least.
它有点像 Levensthein 距离,因为它逐个字符地比较字符串.因此,如果两个名字仅在中间名上有所不同,它们实际上会大不相同.
It's a bit like Levensthein distance, in that it compares the strings character by character. So if two names differ only by the middle name, they'll actually be very different.
这篇关于Ruby 比较两个字符串的相似度百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!