在Python中列出交集和部分字符串匹配 [英] List intersection and partial string matching in Python

查看:33
本文介绍了在Python中列出交集和部分字符串匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有2个列表,第一个来自我的数据集,并包含格式为yyyy-mm-dd hh:mm的日期时间,称为时间 times .示例:

So i have 2 lists the first comes from my dataset and contains dates-times in the format 'yyyy-mm-dd hh:mm', named times. Example:

'2010-01-01 00:00', '2010-01-01 00:15', '2010-01-01 00:30', ...,

另一个是所有唯一年份月份组合的列表,名为 year_and_month .示例:

The other is a list of all the unique year month combinations, named year_and_month. Example:

'2010-01', '2010-02', '2010-03', '2010-04',

因此,我尝试在原始数据集中提取年月组合的所有索引.我使用最差的方法(python中的新方法)来做到这一点,即

So i try to extract all the indices of a year-month combination in the original dataset. I do that using the worst ways (new in python), namely

each_member_indices = []
for i in range(len(year_and_month)):
    item_ind = []
    for j in range(times.shape[0]):
        if year_and_month[i] in times[j]:
            item_ind.append(j)

each_member_indices.append(item_ind)

现在,这是花很多时间工作的核武器.所以我想对其进行一些优化,因此我正在研究一些实现,例如查找两个列表的交集?Python:列表中的完整字符串与部分字符串的交集问题是

Now, this is a nuke for taking so much time to work. So i wanted to optimise it a bit and thus i was looking at some implementations such as Find intersection of two lists? and Python: Intersection of full string from list with partial string the problem being that

res_1 = [val for val in year_and_month if val in times]

产生一个空列表,而

res_1 = [val for val in year_and_month if val in times[0]]

至少产生第一个成员.

有什么想法吗?

我只需要名为 times 的原始数据集中元素的索引,对应于 year_and_month 列表的唯一年月对.因此,根据要求,示例输出为

I am only in need of the indices of the elements from the original dataset named times corresponding the unique year-month pairs of the year_and_month list. So as requested a sample output would be

[[0, 1, 2, 3,...],[925, 926, ...],...]

第一个子列表包含2010年1月对的索引,第二个子列表包含2010年2月的索引,依此类推.

The first sublist contains the indices for the pair 2010-January, the second for the 2010-February and so on.

推荐答案

要在线性时间内做到这一点,您可以构建将年份和月份的组合映射到索引的查找字典.您还可以使用 collections.defaultdict 使其变得更简单:

To do that in linear time, you could build a lookup dictionary mapping year and month combinations to indices. You can also use collections.defaultdict to make it a bit easier:

from collections import defaultdict

d = defaultdict(list)
for i, v in enumerate(times):
    d[v[:7]].append(i)

然后,您可以使用列表理解功能创建结果列表:

Then you can create the result list with a list comprehension:

result = [d[x] for x in year_and_month]

演示:

>>> from collections import defaultdict
>>> times = ['2010-01-01 00:00', '2010-01-01 00:15', '2010-02-01 00:30', '2010-03-01 00:00']
>>> year_and_month = ['2010-01', '2010-02', '2010-03', '2010-04']
>>> d = defaultdict(list)
>>> for i, v in enumerate(times):
...     d[v[:7]].append(i)
...     
>>> dict(d)
{'2010-01': [0, 1], '2010-02': [2], '2010-03': [3]}
>>> [d[x] for x in year_and_month]
[[0, 1], [2], [3], []]

这篇关于在Python中列出交集和部分字符串匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆