基于最大Levenshtien距离的最可能单词 [英] Most Likely Word Based on Max Levenshtien Distance

查看：98 发布时间：2020/5/24 3:58:32 python python-3.x pandas levenshtein-distance fuzzywuzzy

本文介绍了基于最大Levenshtien距离的最可能单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个list的单词:

lst = ['dog', 'cat', 'mate', 'mouse', 'zebra', 'lion']

我也有一个pandas数据框:

df = pd.DataFrame({'input': ['dog', 'kat', 'leon', 'moues'], 'suggested_class': ['a', 'a', 'a', 'a']})

input   suggested_class
dog          a
kat          a
leon         a
moues        a

我想用lst中的值填充suggested_class列，该值与input列中的单词具有最高的levenshtein距离.我正在使用fuzzywuzzy软件包进行计算.

I would like to populate the suggested_class column with the value from lst that has the highest levenshtein distance to a word in the input column. I am using the fuzzywuzzy package to calculate that.

预期输出为:

input   suggested_class
dog          dog
kat          cat
leon         lion
moues        mouse

我知道可以使用df.suggested_class = [autocorrect.spell(w) for w in df.input]包(如df.suggested_class = [autocorrect.spell(w) for w in df.input])来实现某些功能，但这不适用于我的情况.

I'm aware that one could implement something with the autocorrect package like df.suggested_class = [autocorrect.spell(w) for w in df.input] but this would not work for my situation.

我已经尝试过这样的事情(使用from fuzzywuzzy import fuzz):

I've tried something like this (using from fuzzywuzzy import fuzz):

for word in lst:
    for n in range(0, len(df.input)):
        if fuzz.ratio(df.input.iloc[n], word) >= 70:
            df.suggested_class.iloc[n] = word
        else:
            df.suggested_class.iloc[n] = "unknown"

仅适用于设定的距离.我已经可以通过以下方式捕获最大距离:

which only works for a set distance. I've been able to capture the max distance with:

max([fuzz.ratio(df.input.iloc[0], word) for word in lst])

但是在将其与第一个单词联系起来时遇到了麻烦，随后在该单词中填充了suggested_class.

but am having trouble relating that to a word from lst, and subsequently populating suggested_class with that word.

基于最大Levenshtien距离的最可能单词 [英] Most Likely Word Based on Max Levenshtien Distance

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

基于最大Levenshtien距离的最可能单词 [英] Most Likely Word Based on Max Levenshtien Distance

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭