找到子列表之间的交集 [英] Find the intersection between sublists

查看:205
本文介绍了找到子列表之间的交集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我遇到了一个问题 查找子列表之间的交集。告诉具有任何(1个或更多个)交集在一起的子列表成为一个。例如以下列表:

Recently i encounter with a question about Find the intersection between sublists . that tells the sublist which have any (1 or more ) intersection together become one. for example the following list :

l=[[1,2,3],[0,13,6],[9,10],[3,4,5],[10,11],[6,7,50]]

必须转换为:

[[1, 2, 3, 4, 5],[0, 50, 6, 7, 13],[9, 10, 11]] 

所以我写以下功能做得很好,性能良好,我使用设置为了检查成员资格的快速复杂性,并且在内循环中,我使用比较第一个索引的切片的主列表和每个循环中的其他元素,并且还注意到,每个循环之后,列表将在循环内递归。 :

So i wrote the following function to do it that works well with a good performance, i use set for its fast complexity for checking membership and also in inner loop i use slicing that compare the first index of main list with other elements in every loop and also note that the list will been decrease after each loop ,as its a recursion inside the loop . :

s=[set(i) for i in g if i]

def find_intersection(m_list):
    for i,v in enumerate(m_list) : 
        for j,k in enumerate(m_list[i+1:],i+1):
           if v & k:
              s[i]=v.union(m_list.pop(j))
              return find_intersection(m_list)
    return m_list

s=[set(i) for i in l if i]
print find_intersection(s)
[set([1, 2, 3, 4, 5]), set([0, 50, 6, 7, 13]), set([9, 10, 11])]

但我认为可以用另一个解决方案可能有更好的性能,我想到了 collections.deque 或者可能与 numpy 或只是修改我的功能,使它更好吗。如果您有任何建议,我将不胜感激!

But i think it could be done with another solution maybe with better performance , i thought about collections.deque or maybe with numpy or just modifying my function and make it better ? . if you have any suggestion i would be grateful to hear about !

推荐答案

这是一种更有效的算法:

Here is a more efficient algorithm:


  1. 对于至少一个子列表中存在的每个唯一编号,让我们维护包含此数字的所有子列表的索引列表。如果我们使用排序找到唯一的数字,或者 O(n),则这部分是 O(n * log n)如果我们使用哈希表,其中 n 是所有子列表中元素的总数。

  1. For each unique number that is present in at least one of the sublists, let's maintain a list of indices of all sublists that contain this number. This part is O(n * log n) time if we use sorting to find unique numbers or O(n) if we use a hash table, where n is the total number of elements in all sublists.

我们创建一个图,其中顶点是子列表索引,如果两个索引在所有数字中的至少一个索引列表中一起出现,则存在边。我们最多需要创建 O(n)边缘(这部分是微不足道的:没有必要显式创建所有边,我们可以从一个元素到每个子列表中的下一个元素,由于传递性而导致所有唯一元素)。以下是一些伪代码:

Let's create a graph where vertices are sublist indices and an edge is present if two indices appear together in at least one list of indices among all numbers. We need create at most O(n) edges(this part is slightly non-trivial: there is no need to create all edges explicitly, we can just add an edge from an element to the next one in each sublist for all unique elements due to transitivity). Here is some pseudo code:

g = empty graph
for elem in unique_elements:
    sublist_indices = list of indices of all sublists that contain this element
    for i = 1 ... size(sublist_indices - 1):
        g.add_edge(sublist_indices[i], sublist_indices[i + 1])


  • 现在我们可以使用线性时间的深度优先搜索在此图中找到连接的组件(此图是

  • Now we can find connected components in this graph using depth-first search in linear time(this graph is undirected).

    我们知道哪些子列表应该被合并(当且仅当它们在相同的组件中时才应该被合并),所以我们可以很容易地构建答案。

    We know which sublists should be merged(they should be merged if and only if they are in the same connected component), so we can easily construct the answer.

    总时间复杂度为 O(n)。这是最理想的,因为阅读输入已经需要 O(n)操作。

    The total time complexity is O(n). It is optimal because reading the input already requires O(n) operations.

    这篇关于找到子列表之间的交集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆