更快的编辑距离算法 [英] Faster edit distance algorithm

查看:237
本文介绍了更快的编辑距离算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:我知道对于2个大小为n和m的字符串,O(mn)中的平凡编辑距离DP公式和计算。但是我最近知道如果我们只需要计算编辑距离f的最小值并且它是有界的| f |

如果这是基于DP或解释算法,请解释dp公式。



查看
链接的改进算法部分: a href =http://en.wikipedia.org/wiki/Edit_distance =nofollow> http://en.wikipedia.org/wiki/Edit_distance 。



一个关于改进的UKKONEN'S算法的链接 http://www.berghel .net / publications / asm / asm.php



提前感谢。

解决方案

您可以使用下一个简单的想法计算O(min(n,m)* s)时间内的编辑距离:



th字符串在DP表中。



因此,如果我们知道那个答案< = s,那么我们被插入坐标为(i,i-s) + 1),...,(i,i + s)。因为在其他单元格的答案严格大于s。



例如,假设我们知道abacaba和baadba之间的编辑距离小于3。





,我们可以跳过红细胞,因为它们的值大于s。



渐近的算法O(min(n,m)* s)到主对角线的左边和右边。


Problem: I know the trivial edit distance DP formulation and computation in O(mn) for 2 strings of size n and m respectively. But I recently came to know that if we only need to calculate the minimum value of edit distance f and it is bounded |f|<=s, then we can calculate it in O(min(m,n) + s^2) or O(s*min(m,n)) [wikipedia] time.

Please explain the dp formulation behind it if this is DP based or explain the algorithm .

Look at the improved algorithm section of the link: http://en.wikipedia.org/wiki/Edit_distance .

one more link about improved UKKONEN'S algorithm http://www.berghel.net/publications/asm/asm.php

Thanks in advance.

解决方案

You can calculate edit distance in O(min(n, m) * s) time use next simple idea:

Consider the i-th string in DP-table.

So, if we know, that answer <= s, then we are intersted in cells with coordinates (i, i - s), (i, i - s + 1), ... ,(i, i + s). Because in other cells answer strictly greater than s.

For example, suppose we know, that edit distance between "abacaba" and "baadba" less than 3.

So, we can skip red cells, because they have value more than s.

Asymptotic of the algorithm O(min(n, m) * s) because we calculate s cells to the left and right of the main diagonal.

这篇关于更快的编辑距离算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆