Python:查找给定列表的随机k子集分区 [英] Python: Finding random k-subset partition for a given list

查看:71
本文介绍了Python:查找给定列表的随机k子集分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码为给定列表生成长度为 k 的所有分区(k个子分区)。可以在主题中找到
算法。

  def algorithm_u(ns,m):
def visit(n,a):
ps = [[]对于xrange(n)中的i,
xrange(n)中的j:
ps [a [j + 1]]。append(ns [j])
return ps

def f(mu,nu,sigma,n,a):
如果mu == 2:
收益访问(n,a)
其他:
对于v in f(mu-1,nu-1,(mu + sigma)%2,n,a):
收益v
如果nu == mu + 1:
a [mu ] = mu-1
收益访问(n,a)
而a [n]> 0:
a [nu] = a [nu]-1
收益访问(n,a)
elif nu> mu + 1:
,如果(mu + sigma)%2 == 1:
a [nu-1] = mu-1
否则:
a [mu] = mu-1
if((a [nu] + sigma)%2 == 1:
for v in b(mu,nu-1,0,n,a):
收益v
其他:对于f in中的v,
(mu,nu-1,0,n,a):
收益v
,而a [nu] 0:
a [nu] = a [nu]-1
如果(a [nu] + sigma)%2 == 1:
for v in b(mu,nu-1 0,n,a):
收益v
其他:
for v in f(mu,nu-1,0,n,a):
收益v

def b(mu,nu,sigma,n,a):如果nu == mu + 1:
,而a [n] mu-1:
收益访问(n,a)
a [nu] = anu + 1
收益访问(n,a)$ b $baμ= 0
elif nu> mu + 1:
if(a [nu] + sigma)%2 == 1:
for v in f(mu,nu-1,0,n,a):
收益v v
else:b中的v的
(mu,nu-1,0,n,a):
收益v
而a [nu] < mu-1:
a [nu] = a [nu] + 1
如果(a [nu] + sigma)%2 == 1:
for v in f(mu,nu- 1,0,n,a):
收益v
其他:b中v的
(mu,nu-1,0,n,a):
收益v
if(mu + sigma)%2 == 1:
a [nu-1] = 0
else:$ b $baμ= 0
如果mu == 2 :
收益访问(n,a)
其他:
for v in b(mu-1,nu-1,(mu + sigma)%2,n,a):
收益v

n = len(ns)
a = [0] *(n + 1)xrange(1,m + 1)中j的

a [n-m + j] = j-1
返回f(m,n,0,n,a)

我们知道给定列表的k个子集的数量等于斯特林数,对于一些大列表而言,它可能很大。 / p>

上面的代码返回一个Python生成器,可以通过调用其next方法生成给定列表的所有可能的k子集分区。因此,如果我只想随机获得这些分区中的一个,则必须在某些随机时间调用next方法(如果Stirling数很大,这会很慢),或者使用 itertools.islice 方法来获取大小为1的切片,该切片的速度确实比以前慢。



我试图避免列出所有分区,因为这会浪费时间和速度,甚至会浪费内存(因为计算量很大,内存对于我来说很重要) 。



问题是如何仅生成k个子分区中的一个而不生成其余部分?或者至少使该过程比上面解释的要快得多。我需要性能,因为每次只需要获得其中之一,并且我运行该应用程序可能超过一千万次。





编辑:示例



列表: {1,2,3}



for k = 3:

  {{1},{2},{3}} 

对于k = 2:

  {{1,2},{3}} 
{{1,3} ,{2}}
{{1,{2,3}}

对于k = 1:

  {{1,2,3}} 

考虑k = 2,有什么方法可以随机地仅生成这3个分区之一,而不生成其他2个吗?请注意,我想为任何给定的k生成随机分区,而不仅是任何k的随机分区,这意味着如果我将k设置为2,我只想生成这3个中的一个而不是全部5个中的一个。 b
$ b

致谢



穆罕默德

解决方案

您可以通过递归算法通过存储先前计算的值来有效地计算斯特林数:

  fact = [1] 

def nCr(n,k):
返回从n中选择k个元素的方式数
而len(fact)<= n:
fact.append(事实[-1] * len(事实))
返回事实[n] /(事实[k] *事实[nk])

缓存= {}
def count_part(n,k):
返回将n个项目划分为k个非空子集的方式数
如果k == 1:
如果缓存中有键,则返回1
键= n,k

return cache [key]
#第一个元素进入下一个分区
#最多y个n-1个剩余元素中的元素
#将剩余n-1-y划分为k-1个非空子集
#所以n-1-y> = k-1
#y< = nk
t = 0
对于y在范围(0,n-k + 1)中:
t + = count_part(n-1-y,k-1)* nCr(n-1,y)
cache [key] = t
return t

一旦知道有多少选择,就可以修改此递归代码以生成特定的分区:

  def ith_subset (A,k,i):
返回A的第k个子集
#选择第一个元素x
n = len(A)
如果n = = k:
返回A,如果k == 0:
return []
对于x在range(n)中:
#第一个元素是x
#剩下nx-1个,如果i extra = nCr(nx-1,k-1)

休息
i-=额外
返回[A [x]] + ith_su bset(A [x + 1:],k-1,i)

def gen_part(A,k,i):
返回元素的第i个第k个分区在A(零索引)中作为列表列表
如果k == 1:
返回[A]
n = len(A)
#首先找到合适的值对于y-此子集中
中y在范围(0,n-k + 1)中的额外金额:
extra = count_part(n-1-y,k-1)* nCr(n- 1,y)如果i<
打破
i-=额外

#我们对子集进行计数,对于每个子集,我们对分区进行计数
#将i分为子集计数和其余分区的计数
count_partition,count_subset = divmod(i,nCr(n-1,y))
#现在找到第i ^个合适的子集
子集= [A [0]] + ith_subset(A [1:],y,count_subset)
S = set(子集)
return [子集] + gen_part([a in a如果不在S],k-1,count_partition)

作为示例,我写了产生差异的测试程序4个数字的nt分区:

  def test(A):
n = len(A)
for在[1,2,3,4]中的k:
t = count_part(n,k)
打印k,t
对于范围(t)中的i:
打印 ,i,gen_part(A,k,i)

test([1,2,3,4])

此代码显示:

  1 1 
0 [[1、2 ,3,4]]
2 7
0 [[1],[2,3,4]]
1 [[1,2],[3,4]]
2 [[1,3],[2,4]]
3 [[1,4],[2,3]]
4 [[1,2,3],[4 ]]
5 [[1、2、4],[3]]
6 [[1、3、4],[2]]
3 6
0 [ [1],[2],[3、4]]
1 [[1],[2、3],[4]]
2 [[1],[2、4], [3]]
3 [[1,2],[3],[4]]
4 [[1,3],[2],[4]]
5 [ [1,4],[2],[3]]
4 1
0 [[1],[2],[3],[4]]

另一个例子是,将1,2,3,.. 14的1000万个分区分为4部分。
此代码可以用pypy在44秒内生成所有分区。



有1,2,3,...,40个分区的50,369,882,873,307,917,364,901分为4部分。通过在单个处理器上运行pypy,此代码可在120秒内生成其中的一千万个。



要将这些东西捆绑在一起,可以使用此代码生成一个随机分区。将列表A划分为k个非空子集:

  import random 
def random_ksubset(A,k):
i = random.randrange(0,count_part(len(A),k))
返回gen_part(A,k,i)


The following code generates all partitions of length k (k-subset partitions) for a given list. the algorithm could be found in this topic.

def algorithm_u(ns, m):
    def visit(n, a):
        ps = [[] for i in xrange(m)]
        for j in xrange(n):
            ps[a[j + 1]].append(ns[j])
        return ps

    def f(mu, nu, sigma, n, a):
        if mu == 2:
            yield visit(n, a)
        else:
            for v in f(mu - 1, nu - 1, (mu + sigma) % 2, n, a):
                yield v
        if nu == mu + 1:
            a[mu] = mu - 1
            yield visit(n, a)
            while a[nu] > 0:
                a[nu] = a[nu] - 1
                yield visit(n, a)
        elif nu > mu + 1:
            if (mu + sigma) % 2 == 1:
                a[nu - 1] = mu - 1
            else:
                a[mu] = mu - 1
            if (a[nu] + sigma) % 2 == 1:
                for v in b(mu, nu - 1, 0, n, a):
                    yield v
            else:
                for v in f(mu, nu - 1, 0, n, a):
                    yield v
            while a[nu] > 0:
                a[nu] = a[nu] - 1
                if (a[nu] + sigma) % 2 == 1:
                    for v in b(mu, nu - 1, 0, n, a):
                        yield v
                else:
                    for v in f(mu, nu - 1, 0, n, a):
                        yield v

    def b(mu, nu, sigma, n, a):
        if nu == mu + 1:
            while a[nu] < mu - 1:
                yield visit(n, a)
                a[nu] = a[nu] + 1
            yield visit(n, a)
            a[mu] = 0
        elif nu > mu + 1:
            if (a[nu] + sigma) % 2 == 1:
                for v in f(mu, nu - 1, 0, n, a):
                    yield v
            else:
                for v in b(mu, nu - 1, 0, n, a):
                    yield v
            while a[nu] < mu - 1:
                a[nu] = a[nu] + 1
                if (a[nu] + sigma) % 2 == 1:
                    for v in f(mu, nu - 1, 0, n, a):
                        yield v
                else:
                    for v in b(mu, nu - 1, 0, n, a):
                        yield v
            if (mu + sigma) % 2 == 1:
                a[nu - 1] = 0
            else:
                a[mu] = 0
        if mu == 2:
            yield visit(n, a)
        else:
            for v in b(mu - 1, nu - 1, (mu + sigma) % 2, n, a):
                yield v

    n = len(ns)
    a = [0] * (n + 1)
    for j in xrange(1, m + 1):
        a[n - m + j] = j - 1
    return f(m, n, 0, n, a)

we know that number of k-subsets of a given list is equal to Stirling number and it could be very big for some large lists.

the code above returns a Python generator that could generate all possible k-subset partitions for the given list with calling its next method. accordingly, if I want to get only one of these partitions randomly, I have to either call next method for some random times (which makes it really slow if the Stirling number is big) or use the itertools.islice method to get a slice of size one which is really slow as before.

I'm trying to avoid listing all partitions because it would be waste of time and speed and even memory (because calculations are a lot and memory is important in my case).

the question is how can I generate only one of k-subset partitions without generating the rest ? or at least make the procedure very faster than what explained above. I need the performance because I need to get only one of them each time and I'm running the application for maybe more than ten million times.

I'd appreciate any help.

EDIT: EXAMPLE

list : { 1, 2, 3 }

for k = 3:

{ {1}, {2}, {3} }

for k = 2:

{ {1, 2}, {3} }
{ {1, 3}, {2} }
{ {1}, {2, 3} }

and for k = 1:

{ {1, 2, 3} }

consider k = 2, is there any way I can generate only one of these 3 partitions randomly, without generating the other 2? note that I want to generate random partition for any given k not only a random partition of any k which means if I set the k to 2 I would like to generate only one of these 3 not one of all 5.

Regards,

Mohammad

解决方案

You can count Stirling numbers efficiently with a recursive algorithm by storing previously computed values:

fact=[1]

def nCr(n,k):
    """Return number of ways of choosing k elements from n"""
    while len(fact)<=n:
        fact.append(fact[-1]*len(fact))
    return fact[n]/(fact[k]*fact[n-k])

cache = {}
def count_part(n,k):
    """Return number of ways of partitioning n items into k non-empty subsets"""
    if k==1:
        return 1
    key = n,k
    if key in cache:
        return cache[key]
    # The first element goes into the next partition
    # We can have up to y additional elements from the n-1 remaining
    # There will be n-1-y left over to partition into k-1 non-empty subsets
    # so n-1-y>=k-1
    # y<=n-k
    t = 0
    for y in range(0,n-k+1):
        t += count_part(n-1-y,k-1) * nCr(n-1,y)
    cache[key] = t
    return t   

Once you know how many choices there are, you can adapt this recursive code to generate a particular partition:

def ith_subset(A,k,i):
    """Return ith k-subset of A"""
    # Choose first element x
    n = len(A)
    if n==k:
        return A
    if k==0:
        return []
    for x in range(n):
        # Find how many cases are possible with the first element being x
        # There will be n-x-1 left over, from which we choose k-1
        extra = nCr(n-x-1,k-1)
        if i<extra:
            break
        i -= extra
    return [A[x]] + ith_subset(A[x+1:],k-1,i)

def gen_part(A,k,i):
    """Return i^th k-partition of elements in A (zero-indexed) as list of lists"""
    if k==1:
        return [A]
    n=len(A)
    # First find appropriate value for y - the extra amount in this subset
    for y in range(0,n-k+1):
        extra = count_part(n-1-y,k-1) * nCr(n-1,y)
        if i<extra:
            break
        i -= extra
    # We count through the subsets, and for each subset we count through the partitions
    # Split i into a count for subsets and a count for the remaining partitions
    count_partition,count_subset = divmod(i,nCr(n-1,y))
    # Now find the i^th appropriate subset
    subset = [A[0]] + ith_subset(A[1:],y,count_subset)
    S=set(subset)
    return  [subset] + gen_part([a for a in A if a not in S],k-1,count_partition)

As an example, I've written a test program that produces different partitions of 4 numbers:

def test(A):
    n=len(A)
    for k in [1,2,3,4]:
        t = count_part(n,k)
        print k,t
        for i in range(t):
            print " ",i,gen_part(A,k,i)

test([1,2,3,4])

This code prints:

1 1
  0 [[1, 2, 3, 4]]
2 7
  0 [[1], [2, 3, 4]]
  1 [[1, 2], [3, 4]]
  2 [[1, 3], [2, 4]]
  3 [[1, 4], [2, 3]]
  4 [[1, 2, 3], [4]]
  5 [[1, 2, 4], [3]]
  6 [[1, 3, 4], [2]]
3 6
  0 [[1], [2], [3, 4]]
  1 [[1], [2, 3], [4]]
  2 [[1], [2, 4], [3]]
  3 [[1, 2], [3], [4]]
  4 [[1, 3], [2], [4]]
  5 [[1, 4], [2], [3]]
4 1
  0 [[1], [2], [3], [4]]

As another example, there are 10 million partitions of 1,2,3,..14 into 4 parts. This code can generate all partitions in 44 seconds with pypy.

There are 50,369,882,873,307,917,364,901 partitions of 1,2,3,...,40 into 4 parts. This code can generate 10 million of these in 120 seconds with pypy running on a single processor.

To tie things together, you can use this code to generate a single random partition of a list A into k non-empty subsets:

import random
def random_ksubset(A,k):
    i = random.randrange(0,count_part(len(A),k))
    return gen_part(A,k,i)

这篇关于Python:查找给定列表的随机k子集分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆