从列表的元素中在Python中找到互斥集的组合 [英] Find in python combinations of mutually exclusive sets from a list's elements

查看:119
本文介绍了从列表的元素中在Python中找到互斥集的组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我目前正在从事的一个项目中,我已经实现了我希望程序执行的工作的80%,并且对结果感到非常满意。

In a project I am currently working on I have implemented about 80% of what I want my program to do and I am very happy with the results.

在剩下的20%的人面临着使我困惑的问题。
这里是:

In the remaining 20% I am faced with a problem which puzzles me a bit on how to solve. Here it is:

我想出了一个包含几个数字(任意长度)的列表清单
例如:

I have come up with a list of lists which contain several numbers (arbitrary length) For example:

listElement[0] = [1, 2, 3]
listElement[1] = [3, 6, 8]
listElement[2] = [4, 9]
listElement[4] = [6, 11]
listElement[n] = [x, y, z...]

其中n可以达到40,000左右。

where n could reach up to 40,000 or so.

假设每个列表元素是一组数字(从数学意义上来说),我想做的就是得出互斥集的所有组合。也就是说,就像上面列表元素的幂集一样,但是排除了所有不相交集的元素。

Assuming each list element is a set of numbers (in the mathematical sense), what I would like to do is to derive all the combinations of mutually exclusive sets; that is, like the powerset of the above list elements, but with all non-disjoint-set elements excluded.

因此,继续使用n = 4的示例,我想要列出一个具有以下组合的列表:

So, to continue the example with n=4, I would like to come up with a list that has the following combinations:

newlistElement[0] = [1, 2, 3]
newlistElement[1] = [3, 6, 8]
newlistElement[2] = [4, 9]
newlistElement[4] = [6, 11] 
newlistElement[5] = [[1, 2, 3], [4, 9]]
newlistElement[6] = [[1, 2, 3], [6, 11]]
newlistElement[7] = [[1, 2, 3], [4, 9], [6, 11]]
newlistElement[8] = [[3, 6, 8], [4, 9]]
newlistElement[9] = [[4, 9], [6, 11]

无效的情况,例如,将是组合[[1、2、3],[3、6、8]],因为3在两个元素中是相同的。
有什么优雅的方法吗?我将非常感谢您提供任何反馈意见。

An invalid case, for example would be combination [[1, 2, 3], [3, 6, 8]] because 3 is common in two elements. Is there any elegant way to do this? I would be extremely grateful for any feedback.

我还必须指定我不希望执行powerset函数,因为初始列表可能有很多元素(如我所说,n可以达到40000),并且拥有如此多元素的幂集将永远无法完成。

I must also specify that I would not like to do the powerset function, because the initial list could have quite a large number of elements (as I said n could go up to 40000), and taking the powerset with so many elements would never finish.

推荐答案

下面的程序中使用的方法在排除不相交的集合方面类似于之前的几个答案,因此通常不会测试所有组合。它与先前的答案不同,它尽早贪婪地排除了所有可能的集合。这使其运行速度比NPE解决方案快几倍。这是两种方法的时间比较,使用具有200、400,...,1000个size-6集的输入数据,其元素范围在0到20之间:

The method used in the program below is similar to a couple of previous answers in excluding not-disjoint sets and therefore usually not testing all combinations. It differs from previous answers by greedily excluding all the sets it can, as early as it can. This allows it to run several times faster than NPE's solution. Here is a time comparison of the two methods, using input data with 200, 400, ... 1000 size-6 sets having elements in the range 0 to 20:

Set size =   6,  Number max =  20   NPE method
  0.042s  Sizes: [200, 1534, 67]
  0.281s  Sizes: [400, 6257, 618]
  0.890s  Sizes: [600, 13908, 2043]
  2.097s  Sizes: [800, 24589, 4620]
  4.387s  Sizes: [1000, 39035, 9689]

Set size =   6,  Number max =  20   jwpat7 method
  0.041s  Sizes: [200, 1534, 67]
  0.077s  Sizes: [400, 6257, 618]
  0.167s  Sizes: [600, 13908, 2043]
  0.330s  Sizes: [800, 24589, 4620]
  0.590s  Sizes: [1000, 39035, 9689]

在以上数据中,左列以秒为单位显示执行时间。数字列表显示发生了多少个单,双或三联。程序中的常量指定数据集的大小和特征。

In the above data, the left column shows execution time in seconds. The lists of numbers show how many single, double, or triple unions occurred. Constants in the program specify data set sizes and characteristics.

#!/usr/bin/python
from random import sample, seed
import time
nsets,   ndelta,  ncount, setsize  = 200, 200, 5, 6
topnum, ranSeed, shoSets, shoUnion = 20, 1234, 0, 0
seed(ranSeed)
print 'Set size = {:3d},  Number max = {:3d}'.format(setsize, topnum)

for casenumber in range(ncount):
    t0 = time.time()
    sets, sizes, ssum = [], [0]*nsets, [0]*(nsets+1);
    for i in range(nsets):
        sets.append(set(sample(xrange(topnum), setsize)))

    if shoSets:
        print 'sets = {},  setSize = {},  top# = {},  seed = {}'.format(
            nsets, setsize, topnum, ranSeed)
        print 'Sets:'
        for s in sets: print s

    # Method by jwpat7
    def accrue(u, bset, csets):
        for i, c in enumerate(csets):
            y = u + [c]
            yield y
            boc = bset|c
            ts = [s for s in csets[i+1:] if boc.isdisjoint(s)]
            for v in accrue (y, boc, ts):
                yield v

    # Method by NPE
    def comb(input, lst = [], lset = set()):
        if lst:
            yield lst
        for i, el in enumerate(input):
            if lset.isdisjoint(el):
                for out in comb(input[i+1:], lst + [el], lset | set(el)):
                    yield out

    # Uncomment one of the following 2 lines to select method
    #for u in comb (sets):
    for u in accrue ([], set(), sets):
        sizes[len(u)-1] += 1
        if shoUnion: print u
    t1 = time.time()
    for t in range(nsets-1, -1, -1):
        ssum[t] = sizes[t] + ssum[t+1]
    print '{:7.3f}s  Sizes:'.format(t1-t0), [s for (s,t) in zip(sizes, ssum) if t>0]
    nsets += ndelta

编辑:在函数累积中,参数(u,bset,csets)的用法如下:

•u =当前集合并集中的集合列表

•bset =大集合 = u的固定值=已使用的元素

•csets =候选集=符合条件的集合列表

请注意,如果应计金额的第一行被

def accrue(csets,u = [],bset = set()):

,第七行由

的v累计(ts,y,boc):

(即,如果参数已重新排序并为u和bset指定默认值,则可以通过 [accrue(listofsets]] accrue $ c>生成其兼容联合的列表。

In function accrue, arguments (u, bset, csets) are used as follows:
• u = list of sets in current union of sets
• bset = "big set" = flat value of u = elements already used
• csets = candidate sets = list of sets eligible to be included
Note that if the first line of accrue is replaced by
def accrue(csets, u=[], bset=set()):
and the seventh line by
for v in accrue (ts, y, boc):
(ie, if parameters are re-ordered and defaults given for u and bset) then accrue can be invoked via [accrue(listofsets)] to produce its list of compatible unions.

关于 ValueError:格式为零长度的字段名称错误在注释中提到使用Python 2.6时,请尝试以下操作。

Regarding the ValueError: zero length field name in format error mentioned in a comment as occurring when using Python 2.6, try the following.

# change:
    print "Set size = {:3d}, Number max = {:3d}".format(setsize, topnum)
# to:
    print "Set size = {0:3d}, Number max = {1:3d}".format(setsize, topnum)

可能需要在其中进行类似的更改(添加适当的字段编号)程序中的其他格式。请注意, 2.6中的新功能页面上说:对str.format()方法的支持已反向移植到Python 2.6。虽然没有说明是否需要字段名称或数字,但没有显示没有它们的示例。相比之下,这两种方法在2.7.3中都有效。

Similar changes (adding appropriate field numbers) may be needed in other formats in the program. Note, the what's new in 2.6 page says "Support for the str.format() method has been backported to Python 2.6". While it does not say whether field names or numbers are required, it does not show examples without them. By contrast, either way works in 2.7.3.

这篇关于从列表的元素中在Python中找到互斥集的组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆