Levenshtein距离与界限/界限 [英] Levenshtein distance with bound/limit

查看：79 发布时间：2020/9/23 19:50:45 python break levenshtein-distance

本文介绍了Levenshtein距离与界限/界限的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我发现Python 实现= nofollow noreferrer> Levenshtein距离。

I have found some Python implementations of the Levenshtein distance.

我想知道如何有效地修改这些算法，以便在Levenshtein距离大于 n （例如3）而不是一直运行到最后？

I am wondering though how these algorithms can be efficiently modified so that they break if the Levenshtein distance is greater than n (e.g. 3) instead of running until the end?

所以本质上我不想让算法如果我只是想知道距离是否大于阈值，则运行太长的时间来计算最终距离。

So essentially I do not want to let the algorithm run for too long to calculate the final distance if I simply want to know if the distance is greater than a threshold or not.

我在这里找到了一些相关的文章：

I have found some relevant posts here:

Levenstein距离限制

计算Levenshtein距离的最有效方法

Modifying Levenshtein Distance algorithm to not calculate all distances
Levenstein distance limit
Most efficient way to calculate Levenshtein distance
Levenshtein Distance Algorithm better than O(n*m)?

但仍然，我看不到任何Python代码能够完成我上面所描述的（或多或少这些文章也描述了）。

but still, I do not see any Python code which does what I describe above (which is more or less what these posts describe too).

PS：解决方案下面的@amirouche提供的内容基于我已经通过一些基准测试测试过的最快实现（来自此处： https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python ， h ttps：//stackoverflow.com/a/32558749/9024698 ）及其有界版本是我测试中同类产品中最快的一种（不排除可能有更快的测试）。

P.S.: The solution provided by @amirouche below is based on the fastest implementation that I have tested with some benchmarking (from here: https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python, https://stackoverflow.com/a/32558749/9024698) and its bounded version is the fastest one of its kind from my tests (without excluding that there may be even faster ones).

推荐答案

如 Levenstein距离限制，您可以在计算得出要早返回的行上添加测试：

As described in Levenstein distance limit, you can add a test over the row that is computed to return early:

def levenshtein(s1, s2, maximum):
    if len(s1) > len(s2):
        s1, s2 = s2, s1

    distances = range(len(s1) + 1)
    for i2, c2 in enumerate(s2):
        distances_ = [i2+1]
        for i1, c1 in enumerate(s1):
            if c1 == c2:
                distances_.append(distances[i1])
            else:
                distances_.append(1 + min((distances[i1], distances[i1 + 1], distances_[-1])))
        if all((x >= maximum for x in distances_)):
            return False
        distances = distances_
    return distances[-1]

这篇关于Levenshtein距离与界限/界限的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Levenshtein距离与界限/界限 [英] Levenshtein distance with bound/limit

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Levenshtein距离与界限/界限 [英] Levenshtein distance with bound/limit

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭