Damerau-Levenshtein 距离(使用换位编辑距离)c 实现 [英] Damerau–Levenshtein distance (Edit Distance with Transposition) c implementation

查看:60
本文介绍了Damerau-Levenshtein 距离(使用换位编辑距离)c 实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 C++ 中实现了 Damerau-Levenshtein 距离,但它没有为输入(pantera,aorta)提供正确的 o/p 正确的 o/p 是 4,但我的代码给出了 5.....

I implemented the Damerau–Levenshtein distance in c++ but it does not give correct o/p for the input (pantera,aorta) the correct o/p is 4 but my code gives 5.....

int  editdist(string s,string t,int n,int m) 
{
    int d1,d2,d3,cost;
    int i,j;
    for(i=0;i<=n;i++) 
    {
        for(j=0;j<=m;j++)
        {
          if(s[i+1]==t[j+1]) 
              cost=0;
          else
              cost=1;
          d1=d[i][j+1]+1;
          d2=d[i+1][j]+1;
          d3=d[i][j]+cost;
          d[i+1][j+1]=minimum(d1,d2,d3);
          if(i>0 && j>0 && s[i+1]==t[j] && s[i]==t[j+1] )   //transposition
          {
              d[i+1][j+1]=min(d[i+1][j+1],d[i-1][j-1]+cost);
          }
        }
    }
    return d[n+1][m+1]; 
}

我没有看到任何错误.有人能找到代码的问题吗?

I don't see any errors. Can someone find a problem with the code?

推荐答案

帖子中的算法不计算 Damerau-Levenshtein 距离.在维基百科文章中,此算法被定义为最佳字符串对齐距离.

The algorithm in the post does not compute Damerau-Levenshtein distance. In a wikipedia article this algorithm is defined as the Optimal String Alignment Distance.

DL 距离算法的 Java 实现可以在另一个 SO 帖子中找到.

A java implementation of DL distance algorithm can be found in another SO post.

要获得正确的OSA距离值,请将下面标有-的行改为标有+

To get the correct values of OSA distance please change the lines marked with - below with the lines marked with +

 int  editdist(string s,string t,int n,int m) 
 {
     int d1,d2,d3,cost;
     int i,j;
     for(i=0;i<=n;i++) 
     {
         for(j=0;j<=m;j++)
         {
-          if(s[i+1]==t[j+1]) 
+          if(s[i+1]==t[j+1]) 
              cost=0;
           else
              cost=1;
           d1=d[i][j+1]+1;
           d2=d[i+1][j]+1;
           d3=d[i][j]+cost;
           d[i+1][j+1]=minimum(d1,d2,d3);
-          if(i>0 && j>0 && s[i+1]==t[j] && s[i]==t[j+1] )   //transposition
+          if(i>0 && j>0 && s[i]==t[j-1] && s[i-1]==t[j] )   //transposition
           {
               d[i+1][j+1]=min(d[i+1][j+1],d[i-1][j-1]+cost);
           }
         }
     }
     return d[n+1][m+1]; 
 }

看起来好像代码是从用编程语言编写的程序复制的,默认情况下,数组索引从 1 开始.因此,所有对距离数组 d 元素的引用都增加了.但是,对字符串中字符的引用是对从 0 开始的数组的引用,因此不应更新它们.

It looks as if the code was copied from a program written in a programming language where array indices start at 1 by default. Therefore all references to the elements of the distance array d were incremented. However the references to the characters within the strings are references to 0-based arrays, therefore they should not be updated.

要计算距离,必须正确初始化距离数组:

To compute the distance the distance array has to be properly initialized:

for( i = 0; i < n + 1; i++)
      d[i][0] = i;
for( j = 1; j < m + 1; j++)
      d[0][j] = j;

既然你已经得到了答案 5,那么你的距离数组可能已经正确初始化了.

Since you have got the answer 5, you probably have your distance array already initialized correctly.

由于上述算法不计算 DL 距离,这里是 DL 算法的 C 实现的草图(源自 SO 帖子,java impl. 源自维基百科文章中的 ActionScript impl.).

Since the above algorithm does not compute the DL distance, here is a sketch of a C implementation of the DL algorithm (derived from the SO post with a java impl. derived from an ActionScript impl. in the Wikipedia article).

#define d(i,j) dd[(i) * (m+2) + (j) ]
#define min(x,y) ((x) < (y) ? (x) : (y))
#define min3(a,b,c) ((a)< (b) ? min((a),(c)) : min((b),(c)))
#define min4(a,b,c,d) ((a)< (b) ? min3((a),(c),(d)) : min3((b),(c),(d)))

int dprint(int* dd, int n,int m){
 int i,j;
 for (i=0; i < n+2;i++){
    for (j=0;j < m+2; j++){
        printf("%02d ",d(i,j));
    }
    printf("\n");
 }
 printf("\n");
 return 0;
}

int dldist2(char *s, char* t, int n, int m) {
    int *dd;
    int i, j, cost, i1,j1,DB;
    int INFINITY = n + m;
    int DA[256 * sizeof(int)];

    memset(DA, 0, sizeof(DA));

    if (!(dd = (int*) malloc((n+2)*(m+2)*sizeof(int)))) {
      return -1;
    }

    d(0,0) = INFINITY;
    for(i = 0; i < n+1; i++) {
      d(i+1,1) = i ;
      d(i+1,0) = INFINITY;
    }
    for(j = 0; j<m+1; j++) {
      d(1,j+1) = j ;
      d(0,j+1) = INFINITY;
    }      
    dprint(dd,n,m);

    for(i = 1; i< n+1; i++) {
      DB = 0;
      for(j = 1; j< m+1; j++) {
        i1 = DA[t[j-1]];
        j1 = DB;
        cost = ((s[i-1]==t[j-1])?0:1);
        if(cost==0) DB = j;
        d(i+1,j+1) =
          min4(d(i,j)+cost,
              d(i+1,j) + 1,
              d(i,j+1)+1, 
              d(i1,j1) + (i-i1-1) + 1 + (j-j1-1));
      }
      DA[s[i-1]] = i;
      dprint(dd,n,m);
    }
    cost = d(n+1,m+1);
    free(dd);
    return cost;
}

这篇关于Damerau-Levenshtein 距离(使用换位编辑距离)c 实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆