尝试开始使用doParallel和foreach,但没有任何改善 [英] Trying to get started with doParallel and foreach but no improvement

查看:96
本文介绍了尝试开始使用doParallel和foreach,但没有任何改善的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用doParallel和foreach软件包,但是通过使用指南中的引导示例,我正在降低性能.

此示例返回56.87.

当我将dopar更改为do以顺序运行而不是并行运行时,它返回36.65.

如果我执行registerDoParallel(6),它将并行时间降低到42.11,但仍然比顺序执行慢. registerDoParallel(8)变得比连续的更差.

如果我将trials增加到100,000,则顺序运行将使用417.16,而具有3个工作人员的并行运行将使用597.31.并行有6个工人,需要425.85.

我的系统是

  • Dell Optiplex 990

  • Windows 7 Professional 64位

  • 16GB RAM

  • 具有超线程功能的Intel i-7-2600 3.6GHz四核

我在这里做错什么了吗?如果我做了我能想到的最人为的事情(用Sys.sleep(1)代替计算代码),那么我得到的实际减少量与工人人数成正比.我不知道为什么指南中的示例会降低我的性能,而对他们却加速了呢?

解决方案

潜在的问题是doParallel为PSOCK集群的工作程序上执行的每个任务执行attach,以便将导出的变量添加到程序包中搜索路径.这解决了各种范围界定问题,但可能会严重影响性能,尤其是在任务持续时间短且导出的数据量很大的情况下.在您的示例中,不会在Linux和Mac OS X上发生,因为它们将使用mclapply而不是clusterApplyLB,但是如果您明确注册了PSOCK集群,它将在所有平台上发生

我相信我已经找到了如何以不损害性能的另一种方式解决任务范围问题,并且我正在与Revolution Analytics合作,将修复程序引入下一个版本的doParalleldoSNOW,它也有同样的问题.

您可以通过使用任务分块来解决此问题:

ptime2 <- system.time({
  chunks <- getDoParWorkers()
  r <- foreach(n=idiv(trials, chunks=chunks), .combine='cbind') %dopar% {
    y <- lapply(seq_len(n), function(i) {
      ind <- sample(100, 100, replace=TRUE)
      result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
      coefficients(result1)
    })
    do.call('cbind', y)
  }
})[3]

这将导致每个工作人员仅执行一个任务,因此每个工作人员仅执行一次attach,而不是执行trials / 3次.它还导致更少但更大的套接字操作,可以在大多数系统上更有效地执行套接字操作,但是在这种情况下,关键问题是attach.

I am trying to use the doParallel and foreach package but I'm getting reduction in performance using the bootstrapping example in the guide found here CRANpage.

library(doParallel)
library(foreach)
registerDoParallel(3)
x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 10000
ptime <- system.time({
  r <- foreach(icount(trials), .combine=cbind) %dopar% {
    ind <- sample(100, 100, replace=TRUE)
    result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
    coefficients(result1)
    }
  })[3]
ptime

This example returns 56.87.

When I change the dopar to just do to run it sequentially instead of in parallel, it returns 36.65.

If I do registerDoParallel(6) it gets the parallel time down to 42.11 but is still slower than sequentially. registerDoParallel(8) gets 40.31 still worse than sequential.

If I increase trials to 100,000 then the sequential run takes 417.16 and the parallel run with 3 workers takes 597.31. With 6 workers in parallel it takes 425.85.

My system is

  • Dell Optiplex 990

  • Windows 7 Professional 64-bit

  • 16GB RAM

  • Intel i-7-2600 3.6GHz Quad-core with hyperthreading

Am I doing something wrong here? If I do the most contrived thing I can think of (replacing computational code with Sys.sleep(1)) then I get an actual reduction closely proportionate to the number of workers. I'm left wondering why the example in the guide decreases performance for me while for them it sped things up?

解决方案

The underlying problem is that doParallel executes attach for every task execution on the workers of the PSOCK cluster in order to add the exported variables to the package search path. This resolves various scoping issues, but can hurt performance significantly, particularly with short duration tasks and large amounts of exported data. This doesn't happen on Linux and Mac OS X with your example, since they will use mclapply, rather than clusterApplyLB, but it will happen on all platforms if you explicitly register a PSOCK cluster.

I believe that I've figured out how to resolve the task scoping problems in a different way that doesn't hurt performance, and I'm working with Revolution Analytics to get the fix into the next release of doParallel and doSNOW, which also has the same problem.

You can work around this problem by using task chunking:

ptime2 <- system.time({
  chunks <- getDoParWorkers()
  r <- foreach(n=idiv(trials, chunks=chunks), .combine='cbind') %dopar% {
    y <- lapply(seq_len(n), function(i) {
      ind <- sample(100, 100, replace=TRUE)
      result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
      coefficients(result1)
    })
    do.call('cbind', y)
  }
})[3]

This results in only one task per worker, so each worker only executes attach once, rather than trials / 3 times. It also results in fewer but larger socket operations, which can be performed more efficiently on most systems, but in this case, the critical issue is attach.

这篇关于尝试开始使用doParallel和foreach,但没有任何改善的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆