半本地Levenshtein距离 [英] Semi-local Levenshtein distance

查看:92
本文介绍了半本地Levenshtein距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

[从 https://cs.stackexchange.com/questions/12986/sliding-窗口编辑距离]

如果长度为n的长字符串和长度为m的短字符串,什么是合适的递归来让您计算所有n-m + 1

If you have a long string of length n and a shorter string of length m, what is a suitable recurrence to let you compute all n-m+1 Levenshtein distances between the shorter string and all substrings of the longer string of length m?

实际上可以在O(nm)时间内完成吗?

Can it in fact be done in O(nm) time?

推荐答案

计算滑动窗口的Levenshtein距离归结为计算非循环有向 planar 图中几对顶点之间的距离,看起来像这样(大写字母表示成对).

Computing the Levenshtein distances for a sliding window boils down to computing the distances between several pairs of vertices in an acyclic directed planar graph that looks like this one (capital letters denote the pairs).

   h a y s t a c k

n  A-B-C-D-E-F-*-*
   |\|\|\|\|\|\|\|
e  *-*-*-*-*-*-*-*
   |\|\|\|\|\|\|\|
e  *-*-A-B-C-D-E-F

水平和垂直弧的成本为1;如果对应的字母匹配,则对角弧的成本为0,否则为1.

The horizontal and vertical arcs have cost 1; the diagonal arcs have cost 0 if the corresponding letters match or 1 otherwise.

由于所有成对的顶点都位于无穷大的面上,因此Klein或 Cabello-Chambers的多源最短路径算法可用于计算所需的时间O(mn log(mn)).

Since all of the paired vertices lie on the infinite face, Klein's or Cabello-Chambers's multiple-source shortest paths algorithm can be used to compute the needed distances in time O(m n log (m n)).

要刮除最终的记录(实际上,比起Dijkstra的算法要差得多),您可以查看Alexander Tiskin的手稿半本地字符串比较:算法技术和应用程序,它可以解决与此问题相似的问题,即使不是这个问题本身. (也许这应该是我的主要答案,但我还没有读过,并且对多源最短路径技术的了解要好得多.)

To shave the final log (and practically speaking, it's much worse than for, e.g., Dijkstra's algorithm), you might look in Alexander Tiskin's manuscript Semi-local string comparison: Algorithmic techniques and applications, which treats problems similar to this one if not this one itself. (Probably that should be my primary answer, but I haven't read it and know the multiple-source shortest path techniques a lot better.)

也有可能,通过一些其他逻辑来处理单向边缘,我的 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆