Levenshtein距离与界限/界限 [英] Levenshtein distance with bound/limit

查看:79
本文介绍了Levenshtein距离与界限/界限的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现Python 实现= nofollow noreferrer> Levenshtein距离

I have found some Python implementations of the Levenshtein distance.

我想知道如何有效地修改这些算法,以便在Levenshtein距离大于 n (例如3)而不是一直运行到最后?

I am wondering though how these algorithms can be efficiently modified so that they break if the Levenshtein distance is greater than n (e.g. 3) instead of running until the end?

所以本质上我不想让算法如果我只是想知道距离是否大于阈值,则运行太长的时间来计算最终距离。

So essentially I do not want to let the algorithm run for too long to calculate the final distance if I simply want to know if the distance is greater than a threshold or not.

我在这里找到了一些相关的文章:

I have found some relevant posts here:



  1. Levenstein距离限制

  2. 计算Levenshtein距离的最有效方法


  1. Modifying Levenshtein Distance algorithm to not calculate all distances
  2. Levenstein distance limit
  3. Most efficient way to calculate Levenshtein distance
  4. Levenshtein Distance Algorithm better than O(n*m)?

但仍然,我看不到任何Python代码能够完成我上面所描述的(或多或少这些文章也描述了)。

but still, I do not see any Python code which does what I describe above (which is more or less what these posts describe too).

PS:解决方案下面的@amirouche提供的内容基于我已经通过一些基准测试测试过的最快实现(来自此处: https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python h ttps://stackoverflow.com/a/32558749/9024698 )及其有界版本是我测试中同类产品中最快的一种(不排除可能有更快的测试)。

P.S.: The solution provided by @amirouche below is based on the fastest implementation that I have tested with some benchmarking (from here: https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python, https://stackoverflow.com/a/32558749/9024698) and its bounded version is the fastest one of its kind from my tests (without excluding that there may be even faster ones).

推荐答案

Levenstein距离限制,您可以在计算得出要早返回的行上添加测试:

As described in Levenstein distance limit, you can add a test over the row that is computed to return early:

def levenshtein(s1, s2, maximum):
    if len(s1) > len(s2):
        s1, s2 = s2, s1

    distances = range(len(s1) + 1)
    for i2, c2 in enumerate(s2):
        distances_ = [i2+1]
        for i1, c1 in enumerate(s1):
            if c1 == c2:
                distances_.append(distances[i1])
            else:
                distances_.append(1 + min((distances[i1], distances[i1 + 1], distances_[-1])))
        if all((x >= maximum for x in distances_)):
            return False
        distances = distances_
    return distances[-1]

这篇关于Levenshtein距离与界限/界限的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆