根据另一个列表中的值搜索列表 [英] Searching a list based on values in another list

查看:83
本文介绍了根据另一个列表中的值搜索列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要从字符串列表中拉出的名称列表.我不断收到误报,例如部分比赛.另一个警告是,我希望它在适用的情况下也能获得一个姓氏.

names = ['Chris', 'Jack', 'Kim']
target = ['Chris Smith', 'I hijacked this thread', 'Kimberly','Christmas is here', 'CHRIS']

desired_output = ['Chris Smith', 'Kimberly', 'CHRIS']

我尝试了以下代码:

[i for e in names for i in target if i.startswith(e)]

可以预见的是,克里斯·史密斯(Chris Smith),圣诞节到了,金伯利(Kimberly).

我如何最好地解决这个问题?使用正则表达式还是可以使用列表推导来完成?由于实名列表的长度约为880,000个,因此性能可能会成为问题.

(python 2.7)

编辑:我已经意识到,考虑到不希望在圣诞节期间加入金伯利的可能性,我在本示例中的标准是不现实的.为了缓解这个问题,我找到了一个更完整的名称列表,其中包括变体(包括Kim和Kimberly).

解决方案

(再次)完全猜测,因为我看不到如何不能给出Christmas is here给出任何合理的标准:

这将匹配具有以名称中的单词开头的单词的任何目标...

names = ['Chris', 'Jack', 'Kim']
target = ['Chris Smith', 'I hijacked this thread', 'Kimberly','Christmas is here', 'CHRIS']

import re
matches = [targ for targ in target if any(re.search(r'\b{}'.format(name), targ, re.I) for name in names)]
print matches
# ['Chris Smith', 'Kimberly', 'Christmas is here', 'CHRIS']

如果将其更改为\b{}\b' - then you'll get ['Chris Smith', 'CHRIS'],则会丢失Kim ...

I have a list of names which I'm trying to pull out of a list of strings. I keep getting false positives such as partial matches. The other caveat is that I'd like it to also grab a last name where applicable.

names = ['Chris', 'Jack', 'Kim']
target = ['Chris Smith', 'I hijacked this thread', 'Kimberly','Christmas is here', 'CHRIS']

desired_output = ['Chris Smith', 'Kimberly', 'CHRIS']

I've tried this code:

[i for e in names for i in target if i.startswith(e)]

This predictably returns Chris Smith, Christmas is here, and Kimberly.

How would I best approach this? Using regex or can it be done with list comprehensions? Performance may be an issue as the real names list is ~880,000 names long.

(python 2.7)

EDIT: I've realized that my criteria in this example are unrealistic given that the impossible request of wanting to include Kimberly while excluding Christmas is here. To mitigate this issue, I've found a more complete names list which would include variations (both Kim and Kimberly are included).

解决方案

Complete guess (again) since I can't see how you can not have Christmas is here given any reasonable criteria:

This'll match any targets that have any word that starts with a word from names...

names = ['Chris', 'Jack', 'Kim']
target = ['Chris Smith', 'I hijacked this thread', 'Kimberly','Christmas is here', 'CHRIS']

import re
matches = [targ for targ in target if any(re.search(r'\b{}'.format(name), targ, re.I) for name in names)]
print matches
# ['Chris Smith', 'Kimberly', 'Christmas is here', 'CHRIS']

If you change it to \b{}\b' - then you'll get ['Chris Smith', 'CHRIS'] so you lose Kim...

这篇关于根据另一个列表中的值搜索列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆