检查当前单词是否接近字符串中的单词的有效方法是什么? [英] What is efficient way to check if current word is close to a word in string?

查看:50
本文介绍了检查当前单词是否接近字符串中的单词的有效方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下示例:

  1. 示例 1:

  1. Example 1 :

str1 = "wow...it  looks amazing"
str2 = "looks amazi"

你看到 amazi 接近于 amazingstr2 打错了,我想写一个程序来告诉我 amazi 接近 amazing 然后在 str2 我将用 amazing

You see that amazi is close to amazing, str2 is mistyped, i wanted to write a program that will tell me that amazi is close to amazing then in str2 i will replace amazi with amazing

示例 2:

str1 = "is looking good"
str2 = "looks goo"

在这种情况下,更新的 str2 将是 看起来不错"

In this case updated str2 will be "looking good"

示例 3:

str1 = "you are really looking good"
str2 = "lok goo"

在这种情况下 str2 将是 "good" 因为 lok 并不接近 looking(甚至如果程序可以在这种情况下将 lok 转换为 looking 那么对于我的问题的解决方案来说就很好了)

In this case str2 will be "good" as lok is not close to looking (or even if program can convert in this case lok to looking then it's just fine for my problem's solution)

示例 4:

str1 = "Stu is actually SEVERLY sunburnt....it hurts!!!"
str2 = "hurts!!"

更新的 str2 将是 "hurts!!!"

示例 5:

str1 = "you guys were absolutely amazing tonight, a..."
str2 = "ly amazin"

更新后的 str2 将是 "amazing""ly" 将被删除或替换为绝对.

Updated str2 will be "amazing", "ly" shall be removed or replace by absolutely.

这将是什么算法和代码?

What will be the algo and code for this?

也许我们可以通过按字典顺序查看字符并设置一个阈值像 0.8 或 80% 所以如果 word2str1 获取 word1 的 80% 连续字符,那么我们替换 word2str2str1 的单词中?请问还有其他有效的python代码解决方案吗?

Maybe we can do it by looking at character lexicographically and set a threshold like 0.8 or 80% so if word2 gets 80% sequential characters of word1 from str1 then we replace word2 in str2 with word of str1? Any other efficient solution with python code please?

推荐答案

有很多方法可以解决这个问题.这个解决了你所有的例子.我添加了一个最小相似度过滤器来只返回更高质量的匹配.这就是允许在最后一个示例中删除ly"的原因,因为它并不是所有单词都那么接近.

There are a lot of ways to approach this. This one solves all of your examples. I added a minimum similarity filter to return only the higher quality matches. This is what allows the 'ly' to be dropped in the last sample, as it is not all that close any any of the words.

文档

您可以使用 pip install python-Levenshtein

import Levenshtein

def find_match(str1,str2):
    min_similarity = .75
    output = []
    results = [[Levenshtein.jaro_winkler(x,y) for x in str1.split()] for y in str2.split()]
    for x in results:
        if max(x) >= min_similarity:
            output.append(str1.split()[x.index(max(x))])
    return output

您提出的每个样本.

find_match("is looking good", "looks goo")

['looking','good']

find_match("you are really looking good", "lok goo")

['looking','good']

find_match("Stu is actually SEVERLY sunburnt....it hurts!!!", "hurts!!")

['hurts!!!']

find_match("you guys were absolutely amazing tonight, a...", "ly amazin")

['amazing']

这篇关于检查当前单词是否接近字符串中的单词的有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆