索引大小有序电源集 [英] Index into size ordered power set

查看:49
本文介绍了索引大小有序电源集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够索引功能集的元素而无需将整个功能集扩展到内存(其他工具)

I would like to be able to index elements of a power set without expanding the full set into memory (a la itertools)

此外,我希望索引按基数排序.所以索引0应该是空集,索引2 ** n-1应该是所有元素

Furthermore I want the index to be cardinality ordered. So index 0 should be the empty set, index 2**n - 1 should be all elements

到目前为止,我发现的大多数文献都涉及感应地生成功率集.它并不能让您随心所欲.我建立此索引的动机是为了解决分布式执行问题,如果远程计算机可以直接潜入任何地方而无需在整个集群中共享迭代器引用,将很有帮助.

Most literature I have found so far involves generating a power set inductively. It doesn't let you just dive in at any index. My motivation for this indexing is to slice a problem for distributed execution and it be helpful if a remote machine can just dive in anywhere without sharing an iterator reference across a cluster.

Blckknght建议了我追求的解决方案,如下所示

Blckknght suggested the solution I pursued which is shown below

from scipy.misc import comb

def kcombination_to_index(combination):
    index = 0
    combination = sorted(combination)
    for k, ck in enumerate(combination):
        index += comb(ck, k+1, exact=True)
    return index

def index_to_kcombination(index, k):
    result = []
    for k in reversed(range(1, k+1)):
        n = 0
        while comb(n, k, exact=True) <= index:
            n +=1
        result.append(n-1)
        index -= comb(n-1, k, exact=True)

    return result

class PowerSet:
    def __init__(self, elements):
        self.elements = elements

    def __len__(self):
        return 2 ** len(self.elements)

    def __iter__(self):
        for i in range(len(self)):
            yield self[i]

    def __getitem__(self, k):
        if not isinstance(k, int):
            raise TypeError
        #k=0 is empty set,
        #k= 1 - 1+n returns subsets of size 1
        for subset_size in range(len(self.elements) + 1):
            number_subsets = comb(len(self.elements), subset_size, exact=True)

            if k >= number_subsets:
                k -= number_subsets
            else:
                break

        #we now want the kth element of a possible permutation of subset_size elements
        indeces = index_to_kcombination(k, subset_size)

        return map(lambda i: self.elements[i], indeces)

if __name__ == "__main__":
    print "index of combination [8, 6, 3, 1, 0] is", kcombination_to_index([8, 6, 3, 1, 0])
    print "5 combination at position 72 is", index_to_kcombination(72,5)

    ps = PowerSet(["a", "b", "c", "d"])

    for subset_idx in range(len(ps)):
        print ps[subset_idx]

推荐答案

我认为您可以通过两步过程来做到这一点.第一步是如Mihai Maruseac在他(现已删除)的答案中所述,通过迭代可能的大小直到找到合适的大小来找到集合的大小.这是该代码:

I think you can do this with a two step process. The first step is as Mihai Maruseac described in his (now deleted) answer, to find the size of the set by iterating over the possible sizes until you find the appropriate one. Here's code for that:

def find_size(n, i):
    """Return a tuple, (k, i), where s is the size of the i-1'th set in the
       cardinally-ordered powerset of {0..n-1}, and i is the remaining index
       within the combinations of that size."""
    if not 0 <= i < 2**n:
        raise ValueError('index is too large or small')
    for k in range(n+1):
        c = comb(n, k)
        if c > i:
            return k, i
        else:
            i -= c

确定大小后,可以使用组合编号系统从字典顺序中找到正确的k组合:

Once you have determined the size, you can use the combinatorial number system to find the right k-combination from the lexicographical ordering:

def pick_set(n, i):
    """Return the i-1'th set in the cardinally-ordered powerset of {0..n-1}"""
    s, i = find_size(n, i)
    result = []
    for k in range(s, 0, -1):
        prev_c = 0
        for v in range(k, n+1):
            c = comb(v, k)
            if i < c:
                result.append(v-1)
                i -= prev_c
                break
            prev_c = c
    return tuple(result)

这两个函数都需要一个函数来计算大小为n的一组k组合的数量, n C k (我将其称为 comb ).另一个问题为找到该值提供了一些建议的解决方案,包括 scipy.misc.comb gmpy.comb 和一些纯Python实现.或者,由于它是按顺序递增的值重复调用的(例如 comb(n,0) comb(n,1)等,或 comb(k,k) comb(k + 1,k)等),您可以改为使用内联计算,该计算利用先前计算出的值来提供更好的性能.

Both of those functions require a function to calculate the number of k-combinations for a set of size n, nCk (which I've called comb). This other question has several suggested solutions for finding that value, including scipy.misc.comb, gmpy.comb and a few pure-python implementations. Or, since it's called repeatedly with sequentially increasing values (e.g. comb(n, 0), comb(n, 1), etc. or comb(k, k), comb(k+1, k), etc.) you could instead use an inline calculation that takes advantage the previously calculated value to give better performance.

用法示例(使用 comb 函数至少根据问题中的 JF Sebastian的答案进行改编)上方链接):

Example usage (using a comb function minimally adapted from J.F. Sebastian's answer in the question linked above):

>>> for i in range(2**4):
        print(i, pick_set(4, i))

0 ()
1 (0,)
2 (1,)
3 (2,)
4 (3,)
5 (1, 0)
6 (2, 0)
7 (2, 1)
8 (3, 0)
9 (3, 1)
10 (3, 2)
11 (2, 1, 0)
12 (3, 1, 0)
13 (3, 2, 0)
14 (3, 2, 1)
15 (3, 2, 1, 0)

请注意,如果您计划对组合进行迭代(如我在示例中所做的那样),则可能比运行完整算法更有效,因为有更有效的算法可以找到给定大小的下一个组合(不过,当您用尽了初始大小后,需要一些额外的逻辑才能使组合达到下一个更大的大小).

Note that if you plan on iterating over combinations (as I did in the example), you can probably do so more efficiently than by running the full algorithm, as there are more efficient algorithms for finding the next combination of a given size (though you'll need a bit of extra logic to bump up to the next larger size of combinations when you've exhausted the initial size).

这是我上面简要提到的一些优化的实现:

Here are implementations of some of the optimizations I mentioned briefly above:

首先,生成器可以有效地计算 n k 值范围的组合值:

First off, generators that efficiently calculate combination values for ranges of n or k values:

def comb_n_range(start_n, stop_n, k):
    c = comb(start_n, k)
    yield start_n, c
    for n in range(start_n+1, stop_n):
        c = c * n // (n - k)
        yield n, c

def comb_k_range(n, start_k, end_k):
    c = comb(n, start_k)
    yield start_k, c
    for k in range(start_k+1, end_k):
        c = c * (n - k + 1) // k
        yield k, c

用于...范围(...)的:c = comb(...);可以将上面代码中的... 位调整为使用这些位,这应该快一点.

The for ... in range(...): c = comb(...); ... bits in the code above can be adjusted to use these, which should be a bit faster.

接下来,一个函数按字典顺序返回下一个组合:

Next, a function that returns the next combination in lexicographical order:

def next_combination(n, c):
    if c[-1] == n-len(c)+1:
        raise ValueError("no more combinations")
    for i in range(len(c)-1, -1, -1):
        if i == 0 or c[i] < c[i-1] - 1:
            return c[:i] + (c[i] + 1,) + tuple(range(len(c)-2-i,-1,-1))

还有一个生成器,它使用 next_combination 从幂集生成由 slice 对象定义的一定范围的值:

And a generator that uses next_combination to yield a range of values from the powerset, defined by a slice object:

def powerset_slice(n, s):
    start, stop, step = s.indices(2**n)
    if step < 1:
        raise ValueError("invalid step size (must be positive)")

    if start == 0:
        c = ()
    else:
        c = pick_set(n, start)

    for _ in range(start, stop, step):
        yield c
        for _ in range(step):
            try:
                c = next_combination(n, c)
            except ValueError:
                if len(c) == n:
                    return
                c = tuple(range(len(c), -1, -1))

如果通过传递 slice 对象而不是 int ,使 __ getitem __ 返回生成器,则可以将其集成到正在使用的类中.代码>.只需将其主体变成: return self [:] .

You could integrate this into the class you are using by making __getitem__ return the generator if it is passed a slice object, rather than an int. This would let you make __iter__ faster by simply turning its body into: return self[:].

这篇关于索引大小有序电源集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆