Pythonic高效的方法来查找同一集合的两个分区之间的所有不同交集 [英] Pythonic and efficient way to find all the different intersections between two partitions of the same set
问题描述
我需要找到同一集合的两个分区之间的所有不同交集.例如,如果我们有以下两个相同的分区
I need to find all the different intersections between two partitions of the same set. For example, if we have the following two partitions of the same set
x = [[1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
y = [[1, 3, 6, 7], [2, 4, 5, 8, 9, 10]]
所需结果是
[[1],[2],[3],[4、5],[6、7],[8、9、10]].
[[1], [2], [3], [4, 5], [6, 7], [8, 9, 10]].
详细地,我们计算x和y的每个子集之间的笛卡尔乘积,对于这些乘积中的每个,我们将新子集中的元素分类为相应的子集,无论它们是否属于其关联子集的交集.
In detail, we calculate the cartesian product between every subset of x and y, and for each of these products, we classify the elements in new subsets accordingly if they belong to the intersection of their associated subsets or not.
什么是最佳/更pythonic的方式?预先感谢!
What is the optimal / more pythonic way to do it? Thanks in advance!
当前答案的性能比较:
import numpy as np
def partitioning(alist, indices):
return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]
total = 1000
sample1 = np.sort(np.random.choice(total, int(total/10), replace=False))
sample2 = np.sort(np.random.choice(total, int(total/2), replace=False))
a = partitioning(np.arange(total), list(sample1))
b = partitioning(np.arange(total), list(sample2))
def partition_decomposition_product_1(x, y):
out = []
for sublist1 in x:
d = {}
for val in sublist1:
for i, sublist2 in enumerate(y):
if val in sublist2:
d.setdefault(i, []).append(val)
out.extend(d.values())
return out
def partition_decomposition_product_2(x, y):
all_s = []
for sx in x:
for sy in y:
ss = list(filter(lambda x:x in sx, sy))
if ss:
all_s.append(ss)
return all_s
def partition_decomposition_product_3(x, y):
return [np.intersect1d(i,j) for i in x for j in y]
并使用%timeit测量执行时间
And measuring execution time with %timeit
%timeit partition_decomposition_product_1(a, b)
%timeit partition_decomposition_product_2(a, b)
%timeit partition_decomposition_product_3(a, b)
我们找到
2.16 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
620 ms ± 84.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.03 s ± 111 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
因此第二种解决方案是最快的解决方案.
thus the second solution is the fastest one.
推荐答案
两个列表是同一集合的分区这一事实与算法选择无关.这可以归结为遍历两个列表列表并获得每个组合之间的交集(您可以在函数的开头添加该断言,以确保它们是同一集合的分区,请使用此答案来计算列表交集:
The fact that the two lists are partitions of the same set is not relevant to the algorithm choice. This boils down to iterating through two lists of lists and getting the intersection between each combination (you can add that assertion at the beginning of the function to ensure they are partitions of the same set, using this answer to flatten the lists efficiently). With this in mind, this function accomplishes the task, using this answer to calculate list intersection:
def func2(x, y):
# check that they partition the same set
checkx = sorted([item for sublist in x for item in sublist])
checky = sorted([item for sublist in y for item in sublist])
assert checkx == checky
# get all intersections
all_s = []
for sx in x:
for sy in y:
ss = list(filter(lambda x:x in sx, sy))
if ss:
all_s.append(ss)
return all_s
然后使用这种时间比较方法,我们可以看到此新功能比原始实现快100倍.
Then using this time comparison method, we can see that this new function is ~100x faster than your original implementation.
这篇关于Pythonic高效的方法来查找同一集合的两个分区之间的所有不同交集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!