更好的模糊匹配性能? [英] Better fuzzy matching performance?

查看：133 发布时间：2020/7/7 20:32:56 python performance levenshtein-distance fuzzy-comparison difflib

本文介绍了更好的模糊匹配性能?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在使用 difflib 中的方法get_close_matches方法进行迭代15,000个字符串的列表，以与大约15,000个字符串的另一个列表最接近:

I'm currently using method get_close_matches method from difflib to iterate through a list of 15,000 strings to get the closest match against another list of approx 15,000 strings:

a=['blah','pie','apple'...]
b=['jimbo','zomg','pie'...]

for value in a:
    difflib.get_close_matches(value,b,n=1,cutoff=.85)

每个值花费.58秒，这意味着完成循环将花费8,714秒或145分钟.是否有另一种库/方法可能更快或更有效地提高了此方法的速度?我已经尝试过将两个数组都转换为小写，但这只会导致速度略有提高.

It takes .58 seconds per value which means it will take 8,714 seconds or 145 minutes to finish the loop. Is there another library/method that might be faster or a way to improve the speed for this method? I've already tried converting both arrays to lower case, but it only resulted in a slight speed increase.

更好的模糊匹配性能? [英] Better fuzzy matching performance?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

更好的模糊匹配性能? [英] Better fuzzy matching performance?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭