计算Levenshtein编辑距离的复杂度 [英] Calculating the complexity of Levenshtein Edit Distance

查看:119
本文介绍了计算Levenshtein编辑距离的复杂度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我整天都在研究 Levenshtein编辑距离的简单python实现.

def lev(a, b):
    """Recursively calculate the Levenshtein edit distance between two strings, a and b.
    Returns the edit distance.
    """
    if("" == a):
        return len(b)   # returns if a is an empty string
    if("" == b):
        return len(a)   # returns if b is an empty string
    return min(lev(a[:-1], b[:-1])+(a[-1] != b[-1]), lev(a[:-1], b)+1, lev(a, b[:-1])+1)

来自: http://www.clear.rice.edu/comp130/12spring/editdist/

我知道它具有指数复杂性,但是我将如何从头开始计算该复杂性?

我一直在Internet上进行搜索,但没有找到任何解释,只是声明它是指数的.

谢谢.

解决方案

  1. 绘制调用树(您显然已经完成了).

  2. 调用树中的摘要.对于任意 n ,将树的深度 d 确定为 n 的函数.

    还要确定随着 n 接近无穷大,每个节点平均有多少个分支/子节点;这就是平均分支因子 b .

  3. 意识到访问具有平均分支因子 b 的深度 d 树中的每个节点至少需要 b ^ d 操作.用 n 来表示该数字,就输入大小而言,您对复杂性的下限较低.

更具体地说:您将继续递归直到您击中一个空字符串,每次都删除一个字符.如果我们将字符串的长度称为 m n ,则树的深度为min( m n ).在调用树中除叶子之外的每个节点上,您都会精确地递归三遍,因此在极限情况下,平均分支因子为3.这使我们得到了一个Θ(3 ^ min( m n ))节点.最差的情况发生在 m = n 的时候,所以我们可以称其为Θ(3 ^ n ).

这仍然只是复杂性的下限.为了全面了解,您还应该考虑递归调用之间完成的工作量.在此朴素的代码中,这实际上是线性时间,因为a[:-1]必须复制(以Θ( n )的代价)几乎所有a,给出Θ( n 3 ^ n )总复杂度.*

[*我曾经在二进制搜索中使用Python的切片抓住一位CS教授,结果在时间Θ( n lg n )中运行.]

I have been looking at this simple python implementation of Levenshtein Edit Distance for all day now.

def lev(a, b):
    """Recursively calculate the Levenshtein edit distance between two strings, a and b.
    Returns the edit distance.
    """
    if("" == a):
        return len(b)   # returns if a is an empty string
    if("" == b):
        return len(a)   # returns if b is an empty string
    return min(lev(a[:-1], b[:-1])+(a[-1] != b[-1]), lev(a[:-1], b)+1, lev(a, b[:-1])+1)

From: http://www.clear.rice.edu/comp130/12spring/editdist/

I know it has an exponential complexity, but how would I proceed to calculate that complexity from scratch?

I have been searching all over the internet but haven't found any explainations only statements that it is exponential.

Thanks.

解决方案

  1. Draw the call tree (which you apparently have already done).

  2. Abstract from the call tree. For arbitrary n, determine the depth d of the tree as a function of n.

    Also, determine how many branches/children there are per node, on average, as n approaches infinity; that's called the average branching factor b.

  3. Realize that visiting every node in a tree of depth d with average branching factor b takes at least on the order of b ^ d operations. Write that figure in terms of n and you have a lower bound on complexity in terms of the input size.

More specifically: you keep recursing until you hit an empty string, taking one character off each time. If we call the lengths of the strings m and n, then the depth of the tree is min(m, n). At every node in the call tree except the leaves, you recurse exactly three times, so in the limit the average branching factor is 3. That gives us a call tree of Θ(3^min(m, n)) nodes. The worst case occurs when m = n, so we can call that Θ(3^n).

This is still only a lower bound on the complexity. For the full picture, you should also take into account the amount of work done between recursive calls. In this naive code, that's actually linear time because a[:-1] has to copy (at Θ(n) cost) almost all of a, giving Θ(n 3^n) total complexity.*

[* I once caught a CS professor using Python's slicing in a binary search, which as a result ran in time Θ(n lg n).]

这篇关于计算Levenshtein编辑距离的复杂度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆