levenshtein距离与Python列表中的项目 [英] levenshtein distance with items in list in python

查看:100
本文介绍了levenshtein距离与Python列表中的项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面有两个列表,并且我想比较相似的levenshtein距离小于2的单词.我有一个函数来查找levenshtein距离,但是作为参数,它需要两个单词.我可以找到其他列表中没有的单词,但这没有帮助.我可以逐个索引,但是就像下面的情况一样,当我到达索引7(但除外)时,所有内容都被抛出了,因为不忠将是索​​引9和8,wcop88是9和10,因此将无法进行比较.有什么方法可以说如果不忠的一部分出现在另一个列表中的某个单词中,然后检查这两个单词,请注意,这并不总是有效的,因为要说如果不忠和不忠行为只有in和ty可以匹配,那么很多单词可以可能与之匹配

I have two list, below, and i want to compare if words that are similar levenshtein distance of less than 2. I have a function to find the levenshtein distance, however as parameters it needs the two words. I can find which words are not in the other list, but it is not helping. And I can go index by index but as in the case below when i get to index 7 (but and except) everything is thrown off because infidelity will be index 9 and 8 and wcop88 is 9 and 10 hence those won't be compare. Is there some way to say if part of infidelity is in some word in the other list then check those two, note this won't always work because say if infidelity and infedellty there is only the in and ty that can match and many words could possibly match that

[u'rt', u'cuaimatizada', u's', u'cuaimaqueserespeta', u'forgives', u'any', u'mistake', u'but', u'the', u'infidelity', u'wocp88']
[u'rt', u'cuiamatizada', u's', u'cuimaqueserespeta', u'forgive', u'any', u'mistake', u'except', u'infedelity', u'wcop88']

所以我的目标是要能够为我的levenshtein函数提供需要检查的两个词.在这种情况下,有以下几对:

So my goal is to be able to feed my levenshtein function the two words the need to be check. In this case the following pairs:

u'cuaimatizada      u'cuiamatizada

u'cuaimaqueserespeta u'cuimaqueserespeta

u'forgives   u'forgive

u'infedelity  u'infidelity

u'wocp88 u'wcop88

我不知道前面有哪些单词.

I do not know which words before hand.

推荐答案

我认为这就是您想要的...但是它比较所有单词...而不仅仅是匹配索引

I think this is what you want ... but it compares all words... not just matching indexes

 wordpairs = [(w1,w2) for w1 in list1 for w2 in list2 if levenstein(w1,w2) < 2]

>>> matches = [(w1,w2) for w1 in l12 for w2 in l22 if levenshtein(w1,w2) < 2]

[(u'rt', u'rt'), (u's', u's'), (u'cuaimaqueserespeta', u'cuimaqueserespeta'), (u'forgives', u'forgive'), (u'any', u'any'), (u'mistake', u'mistake'), (u'infidelity',u'infedelity')]

这篇关于levenshtein距离与Python列表中的项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆