修复用于不同数量内核的并行仿真运行的种子 [英] Fixing the seed for parallel simulation runs with different number of cores
问题描述
我想并行进行模拟研究以加快速度,并且我还要考虑可重复性.特别是,我希望获得与在顺序模拟运行开始时使用set.seed
相同的结果.
这是我尝试设置它的示例(我在这里故意使用.inorder=T
):
I'd like to parallelize a simulation study to speed it up and I'd also like to account for reproducibility. In particular, I'd like to obtain the same result as if I used set.seed
at the beginning of a sequential simulation run.
Here is an example how I try to set it up (I purposefully use .inorder=T
here):
library(doSNOW)
library(rlecuyer)
nr.cores = 4
nr.simulations = 10
sample.size = 100000
seed = 12345
cl = makeCluster(nr.cores)
registerDoSNOW(cl)
clusterExport(cl=cl, list=c('sample.size'), envir=environment())
clusterSetupRNGstream(cl,rep(seed,6))
result = foreach(i=1:nr.simulations, .combine = 'c', .inorder=T)%dopar%{
tmp = rnorm(sample.size)
tmp[sample.size]
}
stopCluster(cl)
print(paste0('nr.cores = ',nr.cores,'; seed = ',seed,'; time =',Sys.time()))
print(result)
多次运行此示例后,我有两个问题:
There are two questions that I have after running this example several times:
-
核心数会影响生成的序列,例如,对于
nr.cores=1
和4
,仅第一个值重合,而对于nr.cores=4
和8
,前四个值重合. 有没有办法使它独立于nr.cores
?从概念上讲,我想我可以创建大小为nr.simulations * sample.size
的RNG流,将其拆分为nr.simulations
片段,并将它们始终以相同的顺序分配给节点.更简单的是,我可以修复(不同)种子的nr.simulations
值,然后将它们以固定顺序再次传递给节点.可以使用某种类型的节点映射来完成此操作,节点可以使用该映射来从表中读取其适当的种子值.有办法吗?
The number of cores impacts the resulting sequence, e.g., for
nr.cores=1
and4
only the first values coincide, and fornr.cores=4
and8
the first four values coincide. Is there a way to have it independent of thenr.cores
? Conceptually, I’d imagine I could create an RNG stream of sizenr.simulations * sample.size
, split it tonr.simulations
pieces and distribute them to the nodes always in the same order. Even simpler, I could fixnr.simulations
values of (different) seeds and again pass them in a fixed order to the nodes. This could be done having some kind of node mapping which could be used by the nodes to read its appropriate seed value from a table. Is there a way to do it?
当我多次运行该脚本时,即使我不更改任何参数,也会发生(并非总是但不时地)对结果序列进行重新排序的情况(我只是一次又一次地获取文件) .对我来说,这似乎是一个错误,因为.inorder
或clusterSetupRNGstream
都失败了.还是我错过了什么?
When I run the script several times it happens (not always but from time to time) that the resulting sequence is reordered even though I do not change any of the parameters (I just source the file again and again). It just looks like a bug to me as either .inorder
or clusterSetupRNGstream
fail. Or am I missing something?
[1] "nr.cores = 4; seed = 12345; time =2017-09-08 19:00:24"
[1] 1.327091137 -1.800244293 -1.163391460 0.005980001 0.957521136 1.641354433 -1.219033091
[8] -0.238129356 -0.225193384 1.457018576
[1] "nr.cores = 4; seed = 12345; time =2017-09-08 19:00:28"
[1] 1.327091137 -1.800244293 -1.163391460 0.005980001 -0.238129356 0.957521136 1.641354433
[8] -1.219033091 0.870269174 -0.225193384
推荐答案
第一个问题:以下内容似乎对我有用
1st Q: The following seemed to work for me
library(parallel)
library(doParallel)
cl <- makeCluster(5)
registerDoParallel(cl)
seedlist <- c(100, 200, 300, 400, 500)
clusterExport(cl, 'seedlist')
foreach(I=1:5) %dopar% {set.seed(seedlist[I]); runif(1)}
[[1]]
[1] 0.3077661
[[2]]
[1] 0.5337724
[[3]]
[1] 0.9152467
[[4]]
[1] 0.1499731
[[5]]
[1] 0.8336
set.seed(100)
runif(1)
[1] 0.3077661
第二个问题:似乎是一个错误,但也许其他人有更好的线索
2nd Q: Seems like a bug but maybe someone else has a better clue
这篇关于修复用于不同数量内核的并行仿真运行的种子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!