莱文斯坦到Damerau,莱文斯坦 [英] Levenshtein to Damerau-Levenshtein

查看:200
本文介绍了莱文斯坦到Damerau,莱文斯坦的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我坐在这里和programmnging一些算法我在Java中主程序(以及第一个到目前为止)。我编程Levenshtein算法就好了感谢维基这么漂亮的假code到newbeginners加上一个不错的教程:D

I'm sitting here and programmnging some algorithms for my main program in Java (well the first one so far). I programmed the levenshtein algorithm just fine thanks to wiki being so nice with the pseudocode to newbeginners plus a nice tutorial :D

然后我决定升级到Damerau并添加多余的线条,但后来我读到它不是DL算法中,但OptimalStringAlignmentDistance代替。我试着读动作code,以了解更多我需要添加,使其对DL但糊涂了吧。 我一直到不同的地方用code看都到Java,但它们都使用了错误的伪code了。

I then decided to upgrade to Damerau and added the extra lines but then I read that it's not DL algo but OptimalStringAlignmentDistance instead. I tried reading the actionscript code to understand what more I needed to add to make it to DL but got confused instead. I've been to different places with code looking alike to Java but they all use the wrong pseudocode too.

开支的一半​​,我放弃了,决定在这里问了一天之后。是否有任何人谁可以帮助我在Java升级此code到Damerau - 莱文斯坦?

After spending half the day I gave up and decided to ask here. Is there anyone who can help me with upgrading this code to Damerau-Levenshtein in Java?

    public class LevensteinDistance {
        private static int Minimum(int a, int b, int c) {
            return Math.min(Math.min(a, b), c);
        }

        private static int Minimum (int a, int b) {
            return Math.min(a, b);
        }

        public static int computeLevensteinDistance(String s, String t){
            int d[][];
            int n; // length of s
            int m; // length of t
            int i; // iterates through s
            int j; // iterates through t
            char s_i; // ith character of s
            char t_j; // jth character of t
            int cost; // cost

            n = s.length ();
            m = t.length ();
            if (n == 0) {
                return m;
            }
            if (m == 0) {
                return n;
            }
            d = new int[n+1][m+1];

            for (i = 0; i <= n; i++) {
                d[i][0] = i;
            }

            for (j = 0; j <= m; j++) {
                d[0][j] = j;
            }

            for(i = 1; i <= n; i++) {
                s_i = s.charAt (i - 1);
                for(j = 1; j <= m; j++) {
                    t_j = t.charAt (j - 1);

                    if(s_i == t_j){
                        cost = 0;
                    }else{
                        cost = 1;
                    }
                    d[i][j] = Minimum(d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1] + cost);

                    if(i > 1 && j > 1 && s_i == t_j-1 && s_i-1 == t_j){ 
                        d[i][j] = Minimum(d[i][j], d[i-2][j-2] + cost);
                    }
                }
            }
        return d[n][m];
    }

    //    public static void main(String[] args0){
    //      String a = "I decided it was best to ask the forum if I was doing it right";
    //      String b = "I thought I should ask the forum if I was doing it right";
    //      System.out.println(computeLevensteinDistance(a, b));
    //    }
}

下面是维基百科页面 Damerau - 莱文斯坦距离算法

推荐答案

您的问题是从字符串中的条件引用previous字符。在原来的code有:

Your problem is in referencing the previous characters from the string in your conditional. In your original code you have:

if(i > 1 && j > 1 && s_i == t_j-1 && s_i-1 == t_j){ 
  d[i][j] = Minimum(d[i][j], d[i-2][j-2] + cost);
}

现在的问题是价值观的 t_j-1 S_I-1 。这些说的S和T减1的第i个字符,其中算法说,你想要的(第i个减1)的字符。例如,如果字符串s是AFW我是1,那么

The problem is the values t_j-1 and s_i-1. These say the ith character of s and t minus 1, where the algorithm says you want the (ith minus 1) characters. For example if string s is "AFW" and i is 1 then

s_i - 1 = E; //the character value (s[1]='F') minus 1 = 'E'
s.charAt(i-1) = A; //i-1 = 0, s[0] = 'A'

所以你的条件应该是:

so your conditional should read:

if(i > 1 && j > 1 && s_i == t.charAt(j-1) && s.charAt(i-1) == t_j) { 
  d[i][j] = Minimum(d[i][j], d[i-2][j-2] + cost);
}

编辑: Unforutnately我不从阅读code理解算法,但这里是从维基百科的页面在Java中,这给出了一个符合您的例子输出的动作例子的翻译:

Unforutnately I do not understand the algorithm from reading the code, however here is a translation of the ActionScript example from the wikipedia page in Java, which gives an output that matches your example:

public static int damerauLevenshteinDistance(
      String a, String b, int alphabetLength) {
    final int INFINITY = a.length() + b.length();
    int[][] H = new int[a.length()+2][b.length()+2];  
    H[0][0] = INFINITY;
    for(int i = 0; i<=a.length(); i++) {
      H[i+1][1] = i;
      H[i+1][0] = INFINITY;
    }
    for(int j = 0; j<=b.length(); j++) {
      H[1][j+1] = j;
      H[0][j+1] = INFINITY;
    }      
    int[] DA = new int[alphabetLength];
    Arrays.fill(DA, 0);
    for(int i = 1; i<=a.length(); i++) {
      int DB = 0;
      for(int j = 1; j<=b.length(); j++) {
        int i1 = DA[b.charAt(j-1)];
        int j1 = DB;
        int d = ((a.charAt(i-1)==b.charAt(j-1))?0:1);
        if(d==0) DB = j;
        H[i+1][j+1] =
          min(H[i][j]+d,
              H[i+1][j] + 1,
              H[i][j+1]+1, 
              H[i1][j1] + (i-i1-1) + 1 + (j-j1-1));
      }
      DA[a.charAt(i-1)] = i;
    }
    return H[a.length()+1][b.length()+1];
  }

  private static int min(int ... nums) {
    int min = Integer.MAX_VALUE;
    for (int num : nums) {
      min = Math.min(min, num);
    }
    return min;
  }

这篇关于莱文斯坦到Damerau,莱文斯坦的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆