R中的doMC和foreach循环不工作 [英] doMC in R and foreach loop not working

查看:198
本文介绍了R中的doMC和foreach循环不工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在R工作中获得并行处理的foreach包,而我遇到了一些问题:

需要制作的foreach包CRAN for Windows上不存在工作。有些博客建议doSNOW应该做同样的工作。但是,当我用doSNOW运行foreach命令时,%dopar%似乎没有比%do% 。实际上它慢得多。我的CPU是英特尔i7 860 @ 2.80GHz与8 GB的RAM。下面是我的代码:

  ##在1个核心运行示例
require(foreach)
require (虹膜[,5]!=setosa),c(1,5)]
试验= 10000
system.time({
r = foreach(icount(trials),.combine = cbind)%do%{
ind = sample(100,100,replace = TRUE)
results1 = glm(x [ind,2]〜x [ind, 1],家庭=二项式(logit))
系数(结果1)
}
})[3]
#经过
#37.28

$ 2个内核的例子
registerDoSNOW(makeCluster(2,type =SOCK))
getDoParWorkers()
trial = 10000
system.time({
r = foreach(icount(trials),.combine = cbind)%dopar%{
ind = sample(100,100,replace = TRUE)
results1 = glm(x [ind,2]〜x [ (1),家庭=二项式(logit))
系数(结果1)
}
})[3]
#逝去
#108.14

我重新安装了所有需要的软件包,但仍然存在相同的问题。这里是输出:

  sessionInfo()

#R版本2.15.1(2012-06 -22)
#Platform:i386-pc-mingw32 / i386(32位)

#locale:
#[1] LC_COLLATE = English_United States.1252 $ b $ LC_CTYPE = English_United States.1252
#[3] LC_MONETARY = English_United States.1252
#[4] LC_NUMERIC = C
#[5] LC_TIME = English_United States.1252

#附加的基础软件包:
#[1] parallel stats graphics grDevices datasets utils methods
#[8] base

#other attached packages:
#[1] doParallel_1.0.1 codetools_0.2-8 doSNOW_1.0.6 snow_0.3-10
#[5] iterators_1.0.6 foreach_1.4.0 rcom_2.2-5 rscproxy_2.0-5

通过命名空间加载(而不是附加):
#[1] compiler_2.15.1 tools_2.15.1

解决方案

您最好在Windows中使用 doParallel()

  require(foreach)
require(doParallel)
cl < - makeCluster(6)#use 6个内核,即一个8核心机器
registerDoParallel(cl)

然后运行您的 foreach()%dopar%{}



编辑:OP提到仍然看到问题,所以包括我的确切代码。在4核Windows 7 VM上运行,R 2.15.1 32位,只允许 doParallel 使用3个内核:

  require(foreach)
require(doParallel)
cl < - makeCluster(3)
registerDoParallel(cl)

x = iris [which(iris [,5]!=setosa),c(1,5)]

trial = 1000
system.time(
foreach(icount(trials),.combine = cbind)%do%
{
ind = sample(100,100,replace = TRUE)
results1 = glm(x [ind,2]〜家庭=二项式(logit))
results1 = glm(x [ind,2]〜x [ind,1],family =二项式(logit))
results1 = glm (x,ind,2)〜x [ind,1],family =二项式(logit) )
系数(results1)
})[3]

system.time(
foreach(icount(trials),.combine = cbind)%dopar%

ind = sample(100,100,replace = TRUE)
results1 = glm(x [ind,2]〜x [ind,1],family = binomial(logit))
results1 = glm(x [ind,2]〜x [ind,1 ],family = binomial(logit))
results1 = glm(x [ind,2]〜x [ind,1],family = binomial(logit))
results1 = glm(x [ind, 2]〜x [ind,1],family =二项式(logit))
系数(results1)
})[3]

在我的例子中,对于%do%得到17.6秒,对于得到14.8秒。 %dopar%。看着任务执行,看起来执行时间的很多是 cbind ,这是一个并行运行的常见问题。在我自己的模拟中,我已经完成了自定义工作,将我的详细结果保存为并行任务的一部分,而不是通过返回 foreach 来移除这部分开销。 YMMV。


I am trying to get the foreach package for parallel processing in R working and I am having a couple of issues:

The doMC package that is required to make foreach work does not exist on CRAN for Windows. Some blogs suggest that doSNOW instead should do the same job. However, when I run the foreach command with doSNOW, %dopar% does not seem to work faster than %do%. In fact it is much slower. My CPU is an Intel i7 860 @ 2.80GHz with 8 GB of RAM. Below is my code:

##Run example in 1 core 
require(foreach)
require(doSNOW)
x= iris[which(iris[,5] != "setosa"),c(1,5)]
trials = 10000
system.time({
r= foreach(icount(trials), .combine=cbind) %do% {
ind=sample(100,100,replace=TRUE)
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
coefficients(results1)
}
})[3]
#  elapsed 
#  37.28 

# Same example in 2 cores
registerDoSNOW(makeCluster(2,type="SOCK"))
getDoParWorkers()
trials = 10000
system.time({
r= foreach(icount(trials), .combine=cbind) %dopar% {
ind=sample(100,100,replace=TRUE)
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
coefficients(results1)
}
})[3]
# elapsed 
#  108.14 

I re-installed all the packages required but still the same problems. Here is the output:

sessionInfo()

#R version 2.15.1 (2012-06-22) 
#Platform: i386-pc-mingw32/i386 (32-bit)

#locale:
#[1] LC_COLLATE=English_United States.1252 
#[2] LC_CTYPE=English_United States.1252   
#[3] LC_MONETARY=English_United States.1252
#[4] LC_NUMERIC=C                          
#[5] LC_TIME=English_United States.1252    

#attached base packages:
#[1] parallel  stats     graphics  grDevices datasets  utils     methods  
#[8] base     

#other attached packages:
#[1] doParallel_1.0.1 codetools_0.2-8  doSNOW_1.0.6     snow_0.3-10     
#[5] iterators_1.0.6  foreach_1.4.0    rcom_2.2-5       rscproxy_2.0-5  

#loaded via a namespace (and not attached):
#[1] compiler_2.15.1 tools_2.15.1   

解决方案

You are better off in Windows to use doParallel():

require(foreach)
require(doParallel)
cl <- makeCluster(6) #use 6 cores, ie for an 8-core machine
registerDoParallel(cl)

Then run your foreach() %dopar% {}

EDIT: OP mentioned still seeing the problem, so including my exact code. Running on a 4-core Windows7 VM, R 2.15.1 32-bit, only allowing doParallel to use 3 of my cores:

require(foreach)
require(doParallel)
cl <- makeCluster(3)
registerDoParallel(cl)

x= iris[which(iris[,5] != "setosa"),c(1,5)]

trials = 1000 
system.time( 
  foreach(icount(trials), .combine=cbind) %do% 
  {  
    ind=sample(100,100,replace=TRUE) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    coefficients(results1) 
  })[3] 

system.time( 
  foreach(icount(trials), .combine=cbind) %dopar% 
  {  
    ind=sample(100,100,replace=TRUE) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    coefficients(results1) 
  })[3] 

In my case, I'm getting 17.6 sec for %do% and 14.8 sec for %dopar%. Watching the tasks execute, it appears that much of the execution time is the cbind, which is a common issue running parallel. In my own simulations, I have done custom work to save my detailed results as part of the parallel task rather than returning them through foreach, to remove that part of the overhead. YMMV.

这篇关于R中的doMC和foreach循环不工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆