Levenshtein和Trigram的替代品 [英] Alternative to Levenshtein and Trigram

查看：114 发布时间：2020/8/6 3:35:23 levenshtein-distance string-metric

本文介绍了Levenshtein和Trigram的替代品的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

说我的数据库中有以下两个字符串:

Say I have the following two strings in my database:

(1) 'Levi Watkins Learning Center - Alabama State University'
(2) 'ETH Library'

我的软件从数据源接收自由文本输入，并且应该将这些自由文本与数据库中的预定义字符串(上面的字符串)进行匹配.

My software receives free text inputs from a data source, and it should match those free texts to the pre-defined strings in the database (the ones above).

例如，如果软件获取字符串 'Alabama University' ，则它应该认识到，它与(1)相比比与(2)更相似.

For example, if the software gets the string 'Alabama University', it should recognize that this is more similar to (1) than it is to (2).

起初，我想到使用著名的字符串指标，例如Levenshtein-Damerau或Trigrams，但这会导致不良结果，如您在此处看到的那样:

At first, I thought of using a well-known string metric like Levenshtein-Damerau or Trigrams, but this leads to unwanted results as you can see here:

http://fuzzy-string. com/Compare/Transform.aspx?r = ETH + Library& q =阿拉巴马州+大学

Difference to (1): 37
Difference to (2): 14

(2)之所以获胜，是因为它比(1)短得多，即使(1)包含搜索字符串的两个词(Alabama和University).

(2) wins because it is much shorter than (1), even though (1) contains both words (Alabama and University) of the search string.

我也使用Trigrams(使用Javascript库FuzzySet)进行了尝试，但在那里得到了类似的结果.

I also tried it with Trigrams (using the Javascript library fuzzySet), but I got similar results there.

是否有一个字符串度量标准可以识别搜索字符串与(1)的相似性?

Is there a string metric that would recognize the similarity of the search string to (1)?

Levenshtein和Trigram的替代品 [英] Alternative to Levenshtein and Trigram

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Levenshtein和Trigram的替代品 [英] Alternative to Levenshtein and Trigram

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭