需要一个例程来检测相似但不相同的字符串 [英] Need a routine to detect strings that are similar but not identical

查看:66
本文介绍了需要一个例程来检测相似但不相同的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串列表,自上一发行版以来,其中一些已被修改。其中一些更改是微不足道的(间距,一个单词等)。我想检测仅具有较小差异的字符串,以便尽可能尝试使用较旧的翻译。

I have a list of strings, some of which have been modified since my previous release. Some of the changes are trivial (spacing, off by one word, etc). I would like to detect strings that have only "minor" differences, so that I can try to use the older translations if at all possible.

微小差异是什么意思?在开始使用数据库之前,我将不知道。

What do I mean by "minor differences"? I will not know until I start working with the database.

您是否知道任何可调整的例程来指示两个字符串何时相似但不相同?任何例程都会返回一个数字,该数字指示两个字符串的不同程度?

DO you know of any tunable routines that will indicate when two strings are similar but not identical? Any routines that will return a number indicating how different two strings are?

推荐答案

有很多这样的算法。关键字是模糊字符串匹配

There are many such algorithms. Keywords are fuzzy string matching.

一个众所周知的名称是 Levenshtein距离。通过它,您可以计算出将一个字符串转换为另一个字符串所需的更改次数,从而可以估算出这些字符串的相似程度。

A well known one is a Levenshtein distance. By it you can calculate the number of "changes" required to transform one string into another, so that gives you an estimate of how similar the strings are.

另请参见问题:如何搜索相似单词在Delphi中解决。

See also this question: How to search for similar words for solutions in Delphi.

这篇关于需要一个例程来检测相似但不相同的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆