如何计算给定的2串的距离相似性度量? [英] How to calculate distance similarity measure of given 2 strings?

查看:180
本文介绍了如何计算给定的2串的距离相似性度量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要计算给定的2串距离相似性度量。那么究竟是什么意思吗?让我用例子来说明

I need to calculate given 2 strings distance similarity measure. So what exactly i mean ? Let me explain with example

  • 在现实世界中:医院
  • 误字: haspita
  • The real word : hospital
  • Mistaken word : haspita

现在我的目标是多少字,我需要修改错误的话,以获得真正的字。在这个例子中,我需要修改2个字母。因此,这将是百分比?我拿出实际字的长度始终。所以它成为2/8 = 25%,因此这2个给定的字符串的DSM是75%。

Now the my aim is how many character do i need to modify mistaken word to obtain real word. At this example i need to modify 2 letters. So what would be the percent ? I take the length of real word always. So it becomes 2 / 8 = 25% so these 2 given string DSM is 75%.

我怎么能做到这一点的最快捷的方式在C#4.0

How can I do this in a fastest way in C# 4.0

推荐答案

您正在寻找的被称为什么的编辑距离的或的 Levenshtein距离。维基百科的文章解释它是如何计算的,并有一个很好的一张伪code在底部,以帮助您C此算法的C#$ C $变得非常容易。

What you are looking for is called edit distance or Levenshtein distance. The wikipedia article explains how it is calculated, and has a nice piece of pseudocode at the bottom to help you code this algorithm in C# very easily.

下面是从第一个站点实现链接如下:

Here's an implementation from the first site linked below:

private static int  CalcLevenshteinDistance(string a, string b)
    {
    if (String.IsNullOrEmpty(a) || String.IsNullOrEmpty(b))  return 0;

    int  lengthA   = a.Length;
    int  lengthB   = b.Length;
    var  distances = new int[lengthA + 1, lengthB + 1];
    for (int i = 0;  i <= lengthA;  distances[i, 0] = i++);
    for (int j = 0;  j <= lengthB;  distances[0, j] = j++);

    for (int i = 1;  i <= lengthA;  i++)
        for (int j = 1;  j <= lengthB;  j++)
            {
            int  cost = b[j - 1] == a[i - 1] ? 0 : 1;
            distances[i, j] = Math.Min
                (
                Math.Min(distances[i - 1, j] + 1, distances[i, j - 1] + 1),
                distances[i - 1, j - 1] + cost
                );
            }
    return distances[lengthA, lengthB];
    }

这篇关于如何计算给定的2串的距离相似性度量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆