Python:一个列表中的子集元素基于另一个列表中的子字符串,每个子字符串只保留一个元素 [英] Python: subset elements in one list based on substring in another list, retain only one element per substring

查看:24
本文介绍了Python:一个列表中的子集元素基于另一个列表中的子字符串,每个子字符串只保留一个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个列表:

list1 = ['abc-21-6/7', 'abc-56-9/10', 'def-89-7/3', 'hij-2-4/9', 'hij-75-1/7']

list2 = ['abc', 'hij']

我想对 list1 进行子集化,以便:1) 只保留那些子串与 list2 中的元素匹配的元素,2) 对于满足第一个要求的重复元素,我只想随机保留一个重复项.对于这个特定的例子,我想产生一个结果,如:

I would like to subset list1 such that: 1) only those elements with substrings matching an element in list2 are retained, and 2) for duplicated elements that meet the first requirement, I want to randomly retain only one of the duplicates. For this specific example, I would like to produce a result such as:

['abc-21-6/7', 'hij-75-1/7']

我已经编写了代码来满足我的第一个要求:

I have worked out code to meet my first requirement:

[ele for ele in list1 for x in list2 if x in ele]

根据我的具体示例,返回以下内容:

Which, based on my specific example, returns the following:

['abc-21-6/7', 'abc-56-9/10', 'hij-2-4/9', 'hij-75-1/7']

但我被困在第二步 - 如何在重复子串的情况下随机保留一个元素.我想知道 random.choice 函数是否可以以某种方式合并到这个问题中?任何建议将不胜感激!

But I am stuck on the second step - how to randomly retain only one element in the case of duplicate substrings. I'm wondering if the random.choice function can somehow be incorporated into this problem? Any advice will be greatly appreciated!

推荐答案

你可以使用itertools.groupby:

import itertools
import random
list1 = ['abc-21-6/7', 'abc-56-9/10', 'def-89-7/3', 'hij-2-4/9', 'hij-75-1/7']

list2 = ['abc', 'hij']
new_list1 = [i for i in list1 if any(b in i for b in list2)]
new_data = [list(b) for a, b in itertools.groupby(new_list1, key=lambda x: x.split("-")[0])]
final_data = [random.choice(i) for i in new_data]

输出:

['abc-56-9/10', 'hij-75-1/7']

这篇关于Python:一个列表中的子集元素基于另一个列表中的子字符串,每个子字符串只保留一个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆