在Python中编辑距离 [英] Edit Distance in Python

查看:247
本文介绍了在Python中编辑距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用Python编写拼写检查程序。我有一个有效单词的列表(字典),我需要从该词典输出一个单词列表,这些单词与给定无效单词的编辑距离为2。

I'm programming a spellcheck program in Python. I have a list of valid words (the dictionary) and I need to output a list of words from this dictionary that have an edit distance of 2 from a given invalid word.

我知道我需要首先生成一个列表,该列表与无效单词的编辑距离为1(然后在所有生成的单词上再次运行该列表)。我有三种方法,inserts(...),deletesing(...)和changes(...),应该输出编辑距离为1的单词列表,其中inserts输出的所有有效单词的字母多于一个给定单词,删除将输出所有有效单词,但字母少一个,而更改则输出所有有效单词,字母不同。

I know I need to start by generating a list with an edit distance of one from the invalid word(and then run that again on all the generated words). I have three methods, inserts(...), deletions(...) and changes(...) that should output a list of words with an edit distance of 1, where inserts outputs all valid words with one more letter than the given word, deletions outputs all valid words with one less letter, and changes outputs all valid words with one different letter.

我检查了很多地方,但我似乎找不到描述此过程的算法。我提出的所有想法都涉及多次遍历字典列表,这将非常耗时。如果有人可以提供一些见解,我将非常感激。

I've checked a bunch of places but I can't seem to find an algorithm that describes this process. All the ideas I've come up with involve looping through the dictionary list multiple times, which would be extremely time consuming. If anyone could offer some insight, I'd be extremely grateful.

推荐答案

您正在查看的内容称为编辑距离这是关于Wiki的很好的解释。如何定义两个单词之间的距离,有很多方法,您想要的一个称为Levenshtein距离,这是python中的DP实现。

The thing you are looking at is called an edit distance and here is a nice explanation on wiki. There are a lot of ways how to define a distance between the two words and the one that you want is called Levenshtein distance and here is a DP implementation in python.

def levenshteinDistance(s1, s2):
    if len(s1) > len(s2):
        s1, s2 = s2, s1

    distances = range(len(s1) + 1)
    for i2, c2 in enumerate(s2):
        distances_ = [i2+1]
        for i1, c1 in enumerate(s1):
            if c1 == c2:
                distances_.append(distances[i1])
            else:
                distances_.append(1 + min((distances[i1], distances[i1 + 1], distances_[-1])))
        distances = distances_
    return distances[-1]

还有更多实现在这里

这篇关于在Python中编辑距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆