有效识别字符串的一部分是否在列表/字典键中? [英] Efficiently identifying whether part of string is in list/dict keys?

查看:87
本文介绍了有效识别字符串的一部分是否在列表/字典键中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

列表中有很多(> 100,000)小写字符串,其中一个子集可能看起来像这样:

I have a lot (>100,000) lowercase strings in a list, where a subset might look like this:

str_list = ["hello i am from denmark", "that was in the united states", "nothing here"]

我还有一个这样的字典(实际上,它的长度约为1000):

I further have a dict like this (in reality this is going to have a length of around ~1000):

dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

对于列表中包含dict的任何键的所有字符串,我想用相应的dict值替换整个字符串.因此,预期结果应为:

For all strings in the list which contain any of the dict's keys, I want to replace the entire string with the corresponding dict value. The expected result should thus be:

str_list = ["dk", "us", "nothing here"]

鉴于我拥有的字符串数和字典的长度,最有效的方法是什么?

What is the most efficient way to do this given the number of strings I have and the length of the dict?

其他信息:字符串中最多只能有一个dict键.

Extra info: There is never more than one dict key in a string.

推荐答案

假设:

lst = ["hello i am from denmark", "that was in the united states", "nothing here"]
dict_x = {"denmark" : "dk", "germany" : "ger", "norway" : "no", "united states" : "us"}

您可以这样做:

res = [dict_x.get(next((k for k in dict_x if k in my_str), None), my_str) for my_str in lst]

返回:

print(res)  # -> ['dk', 'us', 'nothing here']

关于此的最酷的东西(除了它是python-ninjas最喜欢的武器,又名 list-comprehension )是get,其默认值为my_strnext,其中的<None的c3>值触发上述默认值.

The cool thing about this (apart from it being a python-ninjas favorite weapon aka list-comprehension) is the get with a default of my_str and next with a StopIteration value of None that triggers the above default.

这篇关于有效识别字符串的一部分是否在列表/字典键中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆