Pythonic高效的方法来查找同一集合的两个分区之间的所有不同交集 [英] Pythonic and efficient way to find all the different intersections between two partitions of the same set

查看:67
本文介绍了Pythonic高效的方法来查找同一集合的两个分区之间的所有不同交集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要找到同一集合的两个分区之间的所有不同交集.例如,如果我们有以下两个相同的分区

I need to find all the different intersections between two partitions of the same set. For example, if we have the following two partitions of the same set

x = [[1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
y = [[1, 3, 6, 7], [2, 4, 5, 8, 9, 10]]

所需结果是

[[1],[2],[3],[4、5],[6、7],[8、9、10]].

[[1], [2], [3], [4, 5], [6, 7], [8, 9, 10]].

详细地,我们计算x和y的每个子集之间的笛卡尔乘积,对于这些乘积中的每个,我们将新子集中的元素分类为相应的子集,无论它们是否属于其关联子集的交集.

In detail, we calculate the cartesian product between every subset of x and y, and for each of these products, we classify the elements in new subsets accordingly if they belong to the intersection of their associated subsets or not.

什么是最佳/更pythonic的方式?预先感谢!

What is the optimal / more pythonic way to do it? Thanks in advance!

当前答案的性能比较:

import numpy as np

def partitioning(alist, indices):
    return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]

total = 1000
sample1 = np.sort(np.random.choice(total, int(total/10), replace=False))
sample2 = np.sort(np.random.choice(total, int(total/2), replace=False))

a = partitioning(np.arange(total), list(sample1))
b = partitioning(np.arange(total), list(sample2))

def partition_decomposition_product_1(x, y):
    out = []
    for sublist1 in x:
        d = {}
        for val in sublist1:
            for i, sublist2 in enumerate(y):
                if val in sublist2:
                    d.setdefault(i, []).append(val)
        out.extend(d.values())
    return out

def partition_decomposition_product_2(x, y):
    all_s = []
    for sx in x:
        for sy in y:
            ss = list(filter(lambda x:x in sx, sy))
            if ss:
                all_s.append(ss)
    return all_s

def partition_decomposition_product_3(x, y):
    return [np.intersect1d(i,j) for i in x for j in y]

并使用%timeit测量执行时间

And measuring execution time with %timeit

%timeit partition_decomposition_product_1(a, b)
%timeit partition_decomposition_product_2(a, b)
%timeit partition_decomposition_product_3(a, b)

我们找到

2.16 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
620 ms ± 84.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.03 s ± 111 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

因此第二种解决方案是最快的解决方案.

thus the second solution is the fastest one.

推荐答案

两个列表是同一集合的分区这一事实与算法选择无关.这可以归结为遍历两个列表列表并获得每个组合之间的交集(您可以在函数的开头添加该断言,以确保它们是同一集合的分区,请使用此答案来计算列表交集:

The fact that the two lists are partitions of the same set is not relevant to the algorithm choice. This boils down to iterating through two lists of lists and getting the intersection between each combination (you can add that assertion at the beginning of the function to ensure they are partitions of the same set, using this answer to flatten the lists efficiently). With this in mind, this function accomplishes the task, using this answer to calculate list intersection:

def func2(x, y):
    # check that they partition the same set 
    checkx = sorted([item for sublist in x for item in sublist])
    checky = sorted([item for sublist in y for item in sublist])
    assert checkx == checky

    # get all intersections
    all_s = []
    for sx in x:
        for sy in y:
            ss = list(filter(lambda x:x in sx, sy))
            if ss:
                all_s.append(ss)
    return all_s

然后使用这种时间比较方法,我们可以看到此新功能比原始实现快100倍.

Then using this time comparison method, we can see that this new function is ~100x faster than your original implementation.

这篇关于Pythonic高效的方法来查找同一集合的两个分区之间的所有不同交集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆