给定两个相同长度的python列表.如何返回相似值的最佳匹配? [英] Given two python lists of same length. How to return the best matches of similar values?

查看:93
本文介绍了给定两个相同长度的python列表.如何返回相似值的最佳匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出两个带有字符串(人名)的python列表:

Given are two python lists with strings in them (names of persons):

list_1 = ['J. Payne', 'George Bush', 'Billy Idol', 'M Stuart', 'Luc van den Bergen']
list_2 = ['John Payne', 'George W. Bush', 'Billy Idol', 'M. Stuart', 'Luc Bergen']

我想要一个最相似的名称映射.

I want a mapping of the names, that are most similar.

'J. Payne'           -> 'John Payne'
'George Bush'        -> 'George W. Bush'
'Billy Idol'         -> 'Billy Idol'
'M Stuart'           -> 'M. Stuart'
'Luc van den Bergen' -> 'Luc Bergen'

在python中有一种整洁的方法吗?该列表平均包含5个或6个名称.有时更多,但这很少.有时,它只是每个列表中的一个名字,拼写可能略有不同.

Is there a neat way to do this in python? The lists contain in average 5 or 6 Names. Sometimes more, but this is seldom. Sometimes it is just one name in every list, which could be spelled slightly different.

推荐答案

您可以尝试difflib:

import difflib

list_1 = ['J. Payne', 'George Bush', 'Billy Idol', 'M Stuart', 'Luc van den Bergen']
list_2 = ['John Payne', 'George W. Bush', 'Billy Idol', 'M. Stuart', 'Luc Bergen']

mymap = {}
for elem in list_1:
    closest = difflib.get_close_matches(elem, list_2)
    if closest:
        mymap[elem] = closest[0]

print mymap

输出:

{'George Bush': 'George W. Bush', 
 'Luc van den Bergen': 'Luc Bergen', 
 'Billy Idol': 'Billy Idol', 
 'J. Payne': 'John Payne', 
 'M Stuart': 'M. Stuart'}

这篇关于给定两个相同长度的python列表.如何返回相似值的最佳匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆