2个字符串之间的删除距离 [英] Deletion distance between 2 strings

查看:113
本文介绍了2个字符串之间的删除距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在弄清楚该问题的算法时遇到了麻烦.尽管这不是正确的解决方案,但我将粘贴问题描述以及如何解决该问题. 它类似于编辑距离算法,并且我使用了相同的方法,但是出现了一些问题,我无法确切地知道是什么

I'm having trouble figuring out the algorithm for this problem. I'll paste the problem description and how I kind of solved it, although it's not the correct solution. It is similar to the edit distance algorithm and I used the same approach, but something is off and i cannot figure out exactly what

两个字符串之间的删除距离是ASCII的最小和 您需要在以下两个字符串中删除的字符的值 为了具有相同的字符串.猫与猫之间的删除距离 at是99,因为您可以删除cat的第一个字符, "c"的ASCII值为99.cat和之间的删除距离 蝙蝠是98 + 99,因为您需要删除两个字符的第一个字符 字.当然,两个字符串之间的删除距离不能是 大于它们的总ASCII值之和,因为您可以 始终只是完全删除两个字符串.实施 查找两个之间的删除距离的有效函数 您可以参考Wikipedia上有关该算法的文章 如果需要,请编辑距离.那里的算法还不完全 与此处所需的算法相同,但相似.

The deletion distance between two strings is the minimum sum of ASCII values of characters that you need to delete in the two strings in order to have the same string. The deletion distance between cat and at is 99, because you can just delete the first character of cat and the ASCII value of 'c' is 99. The deletion distance between cat and bat is 98 + 99, because you need to delete the first character of both words. Of course, the deletion distance between two strings can't be greater than the sum of their total ASCII values, because you can always just delete both of the strings entirely. Implement an efficient function to find the deletion distance between two strings.You can refer to the Wikipedia article on the algorithm for edit distance if you want to. The algorithm there is not quite the same as the algorithm required here, but it's similar.

这是我的代码.我使用了动态编程方法. 我想说的是,最后一个"else"之后的行需要更改,但是可以随时纠正任何错误

This is my code. I used a dynamic programming approach. I would say the line after the last "else" needs to be changed, but feel free to correct any mistake

def delete_distance(s1, s2):
    m = [[0 for j in range(len(s2)+1)] for i in range(len(s1)+1)]
    for i in range(len(s1)+1):
        for j in range(len(s2)+1):
            if i == 0:
                m[i][j] = sum(bytearray(s2[:j]))
            elif j == 0:
                m[i][j] = sum(bytearray(s1[:i]))
            elif s1[i-1] == s2[j-1]:
                m[i][j] = m[i-1][j-1]
            else:
                m[i][j] = ord(s1[i-1]) + ord(s2[j-1]) + min(m[i-1][j-1], m[i-1][j], m[i][j-1])
    return m[len(s1)][len(s2)]

我知道这是错误的,因为delete_distance('cat','cbat')的输出是197,正确的结果应该是98,因为我们只需要删除ASCII值为98的b.

I know it's wrong because the output of delete_distance('cat', 'cbat') is 197, and the correct result should be 98, because we only need to delete b which has an ASCII value of 98.

推荐答案

正如Ken Y-N在先前的回答中所提到的,else部分应至少为3项运营成本. 此答案的唯一变化是-改写以适合您的问题.

As mentioned in the previous answer by Ken Y-N, the else part should be minimum of 3 operations cost. The only change in this answer is - it is rephrased to suite your problem.

这3个操作是:

  • S1删除
  • S2删除
  • S1和amp; S2删除

以下内容应该起作用-我猜:

The following should work - I guess:

def delete_distance(s1, s2):
    m = [[0 for j in range(len(s2)+1)] for i in range(len(s1)+1)]
    for i in range(len(s1)+1):
        for j in range(len(s2)+1):
            if i == 0:
                m[i][j] = sum(bytearray(s2[:j]))
            elif j == 0:
                m[i][j] = sum(bytearray(s1[:i]))
            elif s1[i-1] == s2[j-1]:
                m[i][j] = m[i-1][j-1]
            else:
                s1del = ord(s1[i-1])
                s2del = ord(s2[j-1])
                s1s2del = s1del + s2del
                m[i][j] = min(m[i-1][j-1] + s1s2del, m[i-1][j] + s1del, m[i][j-1] + s2del)
    return m[len(s1)][len(s2)]

希望有帮助!

这篇关于2个字符串之间的删除距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆