大n的所有可能排列 [英] All possible permutations for large n

查看:92
本文介绍了大n的所有可能排列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用R查找大n的所有可能排列.目前,我正在使用gtools包中的permutations(n,n),但是对于n>10,这几乎是不可能的.由于大量排列(n!),导致内存崩溃.我不想抽样,因为我需要找到特定统计信息的确切分布.有什么办法可以使我更快地执行此操作,或者可以将其分解成小块?

I want to find all possible permutations for large n using R. At the moment I am using permutations(n,n) from gtools package, but for n>10 it is almost impossible; I get memory crashes due to the large number of permutations (n!). I do not want to sample as I need to find the exact distribution for a particular statistic. Is there any way I can do this faster or that I can break it down into small blocks?

推荐答案

您的目标很可能是不切实际的(大n"有多大????即使您可以生成大量的排列,多长时间也是?您需要对它们进行总结吗?对十亿个元素进行详尽的计算与一千万个元素的随机样本之间在准确性上会有多少差异?).但是:

Your goal is very likely to be impractical (how large is "large n"??? even if you can generate an enormous number of permutations, how long is it going to take you to summarize over them? How much difference in accuracy is there going to be between an exhaustive computation on a billion elements and a random sample of ten million of them?). However:

iterpc包可以枚举块中的排列.例如:

The iterpc package can enumerate permutations in blocks. For example:

library("iterpc")

设置一个对象(迭代器")以生成10个对象的排列:

Set up an object ("iterator") to generate permutations of 10 objects:

I <- iterpc(10,labels=1:10,ordered=TRUE)

返回前5个排列:

getnext(I,5)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    2    3    4    5    6    7    8    9    10
## [2,]    1    2    3    4    5    6    7    8   10     9
## [3,]    1    2    3    4    5    6    7    9    8    10
## [4,]    1    2    3    4    5    6    7    9   10     8
## [5,]    1    2    3    4    5    6    7   10    8     9

返回下5个排列:

getnext(I,5)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    2    3    4    5    6    7   10    9     8
## [2,]    1    2    3    4    5    6    8    7    9    10
## [3,]    1    2    3    4    5    6    8    7   10     9
## [4,]    1    2    3    4    5    6    8    9    7    10
## [5,]    1    2    3    4    5    6    8    9   10     7

假设您一次可以计算一个统计块,然后组合结果,那么这应该是可行的.但是,它看起来似乎不容易并行化:无法跳转到迭代器的特定元素... sna包中的numperm函数提供了随机"(即非顺序)尽管与iterpc给出的排列顺序不同,但可以访问置换-但我猜想iterpc效率更高,因此最好在单个节点/核心/机器上依次处理块而不是分发过程.

Assuming you can compute your statistic one block at a time and then combine the results, this should be feasible. It doesn't look like you can parallelize very easily, though: there's no way to jump to a particular element of an iterator ... The numperm function from the sna package provides "random" (i.e. non-sequential) access to permutations, although in a different ordering from those given by iterpc - but I'm guessing that iterpc is much more efficient, so you may be better off crunching through blocks sequentially on a single node/core/machine rather than distributing the process.

这是sna::numperm给出的前5个排列:

Here are the first 5 permutations as given by sna::numperm:

library("sna")
t(sapply(1:5,numperm,olength=10))
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    2    1    3    4    5    6    7    8    9    10
## [2,]    2    3    1    4    5    6    7    8    9    10
## [3,]    2    3    4    1    5    6    7    8    9    10
## [4,]    2    3    4    5    1    6    7    8    9    10
## [5,]    2    3    4    5    6    1    7    8    9    10

iterpc的内脏是用C ++编写的,因此它应该非常高效,但是无论什么情况,对于n的较大值都会变得困难.令我惊讶的是,iterpc可以处理全部10个!= 3628800排列,没有太多麻烦:

The guts of iterpc are written in C++, so it should be very efficient, but no matter what things are going to get hard for larger values of n. To my surprise, iterpc can handle the full set of 10!=3628800 permutations without much trouble:

system.time(g <- getall(I))
##    user  system elapsed 
##   0.416   0.304   0.719 
dim(g)
## [1] 3628800      10

但是,我无法在计算机的单个块中使用n>10进行任何计算(n = 11:"cannot allocate vector of size 1.6 Gb" ... n> 11 "The length of the iterator is too large, try using getnext(I,d)")

However, I can't do any computations with n>10 in a single block on my machine (n=11: "cannot allocate vector of size 1.6 Gb" ... n>11 "The length of the iterator is too large, try using getnext(I,d)")

这篇关于大n的所有可能排列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆