Perl模块进行文本比较 [英] Perl module for text comparison

查看:80
本文介绍了Perl模块进行文本比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能建议一个Perl模块,它可以比较两个字符串并返回它们匹配的程度?我对CPAN进行了广泛的搜索,尽管有类似 String :: Approx Data :: Compare 之类的模块,但它们并不是我想要的。假设我有两个字符串:我爱你我生你。我希望功能能够比较两个字符串,并考虑多个参数,以正确的顺序匹配单词( love ,因为字符串中的第一个单词不应 match love 作为第二个字符串中的第四个单词,即使两个字符串都有该单词),单词也不匹配但拼写几乎相似(例如说 love loge ),数字,等等,并返回一个索引,例如从0到1的数字,范围为1,代表两个字符串之间的相似度。有这样的Perl模块吗?

Can anyone suggest a Perl module which can compare two strings and return a degree to which they match? I searched CPAN extensively, and although there are similar modules like String::Approx and Data::Compare, they are not what I am looking for. Suppose I have two strings : I love you, and I boht you. I want functionality which will compare these two strings, taking into account numerous parameters, the matching of words in correct order (love as the first word in a string should not "match" love as the 4th word in the 2nd string, even though both strings have that word), words not matching but spelt almost similarly (like say love and loge), number of words, etc and return an index, say a number from 0 to 1 on a scale of 1, representing the degree of similarity between the two strings. Is there any such Perl module?

推荐答案

有很多这样的模块。不过,通常情况下,您必须以某种特殊方式使用它们来解释您自己的假设。像这样的大多数字符串比较工具只是实现某种算法来将一个字符串与另一个字符串进行比较。大多数人认为,如果您要制定特定的政策决策,则可以自己编写代码。

There are many such modules. Often, though, you'll have to make use of them in some special way to account for your own assumptions. Most of the string comparison tools like this just implement some algorithm for comparing one string to another. Most assume that if you have specific policy decisions to make, you'll code them yourself.

我个人不确定我是否会推荐文本:: Levenshtein 由于存在错误且缺乏ut8支持。不过,我也没有更好的建议。

Personally, I am not sure I'd recommend Text::Levenshtein because of bugs and lack of ut8 support. I don't have a better recommendation either, though.

但是,这些搜索将揭示您可以研究的许多潜在模块,并确定最适合您目的的模块(基于执行此类操作的通用算法的名称):

However, these searches will reveal lots of potential modules you could look into and determine what works best for your purpose (based on the names of common algorithms for doing this sort of thing):

  • https://metacpan.org/search?q=levenshtein
  • https://metacpan.org/search?q=wagner+fischer
  • https://metacpan.org/search?q=edit+distance

如果您对语音相似性感兴趣,还可以进行语音比较:

If you're interested in spoken similarities, you can also look into phonetic comparisons:

  • https://metacpan.org/search?q=phonetic
  • https://metacpan.org/search?q=soundex
  • https://metacpan.org/search?q=metaphone

这篇关于Perl模块进行文本比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆