如何在嵌套列表中查找所有列表共有的元素? [英] How to find elements that are common to all lists in a nested list?
问题描述
我有一个很大的嵌套列表,嵌套列表中的每个列表都包含一个以浮点数格式设置的数字列表.但是,除了少数例外,嵌套列表中的每个单独列表都是相同的.我想提取嵌套列表中所有列表共有的数字.下面显示了我的问题的一个简单示例:
I have a large nested list and each list within the nested list contains a list of numbers that are formatted as floats. However every individual list in the nested list is the same except for a few exceptions. I want to extract the numbers that are common to all of the lists in the nested list. A simple example of my problem is shown below:
nested_list = [[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0],
[2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0],
[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0],
[2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0]]
在以下情况下,我想提取以下内容:
In the following case I would want to extract the following:
common_vals = [2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0]
我尝试使用集合交集解决此问题,但由于无法使它在嵌套列表的所有元素上起作用.
I tried to use set intersections to solve this but since I wasn't able to get this to work on all of the elements of the nested list.
推荐答案
您可以使用reduce
和set.intersection
:
>>> reduce(set.intersection, map(set, nested_list))
set([2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0])
将itertools.imap
用于内存高效解决方案.
Use itertools.imap
for memory efficient solution.
>>> lis = [[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0],
[2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0],
[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0],
[2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0]]
>>> %timeit set.intersection(*map(set, lis))
100000 loops, best of 3: 12.5 us per loop
>>> %timeit set.intersection(*(set(e) for e in lis))
10000 loops, best of 3: 14.4 us per loop
>>> %timeit reduce(set.intersection, map(set, lis))
10000 loops, best of 3: 12.8 us per loop
>>> %timeit reduce(set.intersection, imap(set, lis))
100000 loops, best of 3: 13.1 us per loop
>>> %timeit set.intersection(set(lis[0]), *islice(lis, 1, None))
100000 loops, best of 3: 10.6 us per loop
>>> lis = [[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0],
[2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0],
[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0],
[2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0]]*1000
>>> %timeit set.intersection(*map(set, lis))
10 loops, best of 3: 16.4 ms per loop
>>> %timeit set.intersection(*(set(e) for e in lis))
10 loops, best of 3: 15.8 ms per loop
>>> %timeit reduce(set.intersection, map(set, lis))
100 loops, best of 3: 16.3 ms per loop
>>> %timeit reduce(set.intersection, imap(set, lis))
10 loops, best of 3: 13.8 ms per loop
>>> %timeit set.intersection(set(lis[0]), *islice(lis, 1, None))
100 loops, best of 3: 8.4 ms per loop
>>> lis = [[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0], [2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0],
[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0],
[2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0]]*10**5
>>> %timeit set.intersection(*map(set, lis))
1 loops, best of 3: 1.92 s per loop
>>> %timeit set.intersection(*(set(e) for e in lis))
1 loops, best of 3: 2.17 s per loop
>>> %timeit reduce(set.intersection, map(set, lis))
1 loops, best of 3: 2.14 s per loop
>>> %timeit reduce(set.intersection, imap(set, lis))
1 loops, best of 3: 1.52 s per loop
>>> %timeit set.intersection(set(lis[0]), *islice(lis, 1, None))
1 loops, best of 3: 913 ms per loop
结论:
Steven Rumbalski的解决方案显然是效率最高的解决方案.
Steven Rumbalski's solution is clearly the best one in terms of efficiency.
这篇关于如何在嵌套列表中查找所有列表共有的元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!