修复用于不同数量内核的并行仿真运行的种子 [英] Fixing the seed for parallel simulation runs with different number of cores

查看:39
本文介绍了修复用于不同数量内核的并行仿真运行的种子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想并行进行模拟研究以加快速度,并且我还要考虑可重复性.特别是,我希望获得与在顺序模拟运行开始时使用set.seed相同的结果. 这是我尝试设置它的示例(我在这里故意使用.inorder=T):

I'd like to parallelize a simulation study to speed it up and I'd also like to account for reproducibility. In particular, I'd like to obtain the same result as if I used set.seed at the beginning of a sequential simulation run. Here is an example how I try to set it up (I purposefully use .inorder=T here):

library(doSNOW)
library(rlecuyer)

nr.cores = 4
nr.simulations = 10 
sample.size = 100000

seed = 12345

cl = makeCluster(nr.cores)
registerDoSNOW(cl)
clusterExport(cl=cl, list=c('sample.size'), envir=environment())
clusterSetupRNGstream(cl,rep(seed,6))

result = foreach(i=1:nr.simulations, .combine = 'c', .inorder=T)%dopar%{
  tmp = rnorm(sample.size)
  tmp[sample.size]
}

stopCluster(cl)

print(paste0('nr.cores = ',nr.cores,'; seed = ',seed,'; time =',Sys.time()))
print(result)

多次运行此示例后,我有两个问题:

There are two questions that I have after running this example several times:

  1. 核心数会影响生成的序列,例如,对于nr.cores=14,仅第一个值重合,而对于nr.cores=48,前四个值重合. 有没有办法使它独立于nr.cores?从概念上讲,我想我可以创建大小为nr.simulations * sample.size的RNG流,将其拆分为nr.simulations片段,并将它们始终以相同的顺序分配给节点.更简单的是,我可以修复(不同)种子的nr.simulations值,然后将它们以固定顺序再次传递给节点.可以使用某种类型的节点映射来完成此操作,节点可以使用该映射来从表中读取其适当的种子值.有办法吗?

  1. The number of cores impacts the resulting sequence, e.g., for nr.cores=1 and 4 only the first values coincide, and for nr.cores=4 and 8 the first four values coincide. Is there a way to have it independent of the nr.cores? Conceptually, I’d imagine I could create an RNG stream of size nr.simulations * sample.size, split it to nr.simulations pieces and distribute them to the nodes always in the same order. Even simpler, I could fix nr.simulations values of (different) seeds and again pass them in a fixed order to the nodes. This could be done having some kind of node mapping which could be used by the nodes to read its appropriate seed value from a table. Is there a way to do it?

当我多次运行该脚本时,即使我不更改任何参数,也会发生(并非总是但不时地)对结果序列进行重新排序的情况(我只是一次又一次地获取文件) .对我来说,这似乎是一个错误,因为.inorderclusterSetupRNGstream都失败了.还是我错过了什么?

When I run the script several times it happens (not always but from time to time) that the resulting sequence is reordered even though I do not change any of the parameters (I just source the file again and again). It just looks like a bug to me as either .inorder or clusterSetupRNGstream fail. Or am I missing something?

[1] "nr.cores = 4; seed = 12345; time =2017-09-08 19:00:24"
[1]  1.327091137 -1.800244293 -1.163391460  0.005980001  0.957521136  1.641354433 -1.219033091
[8] -0.238129356 -0.225193384  1.457018576

[1] "nr.cores = 4; seed = 12345; time =2017-09-08 19:00:28"
[1]  1.327091137 -1.800244293 -1.163391460  0.005980001 -0.238129356  0.957521136  1.641354433
[8] -1.219033091  0.870269174 -0.225193384

推荐答案

第一个问题:以下内容似乎对我有用

1st Q: The following seemed to work for me

library(parallel)
library(doParallel)
cl <- makeCluster(5)
registerDoParallel(cl)
seedlist <- c(100, 200, 300, 400, 500)
clusterExport(cl, 'seedlist')
foreach(I=1:5) %dopar% {set.seed(seedlist[I]); runif(1)}

[[1]]
[1] 0.3077661

[[2]]
[1] 0.5337724

[[3]]
[1] 0.9152467

[[4]]
[1] 0.1499731

[[5]]
[1] 0.8336


set.seed(100)
runif(1)
[1] 0.3077661

第二个问题:似乎是一个错误,但也许其他人有更好的线索

2nd Q: Seems like a bug but maybe someone else has a better clue

这篇关于修复用于不同数量内核的并行仿真运行的种子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆