有没有一种有效的方法来并行化 mapply? [英] Is there an efficient way to parallelize mapply?

查看:28
本文介绍了有没有一种有效的方法来并行化 mapply?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多行,在每一行上我都计算非线性函数的 uniroot.我有一台四核 Ubuntu 机器,它已经两天没有停止运行我的代码了.毫不奇怪,我正在寻找加快速度的方法;-)

I have many rows and on every row I compute the uniroot of a non-linear function. I have a quad-core Ubuntu machine which hasn't stopped running my code for two days now. Not surprisingly, I'm looking for ways to speed things up ;-)

经过一番研究,我注意到目前只使用了一个内核,并且需要进行并行化.深入挖掘后,我得出结论(可能是错误的?)包 foreach 并不是真正针对我的问题,因为产生了太多的开销(例如,参见 SO).对于 Unix 机器,一个不错的选择似乎是 multicore.尤其是pvec这个功能,我查了一下帮助页面,感觉效率最高.

After some research, I noticed that only one core is currently used and parallelization is the thing to do. Digging deeper, I came to the conclusion (maybe incorrectly?) that the package foreach isn't really meant for my problem because too much overhead is produced (see, for example, SO). A good alternative seems to be multicore for Unix machines. In particular, the pvec function seems to be the most efficient one after I checked the help page.

但是,如果我理解正确的话,这个函数只需要一个向量并相应地拆分它.我需要一个可以并行化的函数,但需要 多个 向量(或一个 data.frame 代替),就像 mapply 函数一样.有什么我错过的吗?

However, if I understand it correctly, this function only takes one vector and splits it up accordingly. I need a function that can be parallized, but takes multiple vectors (or a data.frame instead), just like the mapply function does. Is there anything out there that I missed?

这是我想要做的一个小例子:(请注意,我在此处包含了一个 plyr 示例,因为它可以替代基本的 mapply 函数和它有一个并行化选项.但是,它在我的实现中较慢,在内部,它调用 foreach 进行并行化,所以我认为它无济于事.对吗?)

Here is a small example of what I want to do: (Note that I include a plyr example here because it can be an alternative to the base mapply function and it has a parallelize option. However, it is slower in my implementation and internally, it calls foreach to parallelize, so I think it won't help. Is that correct?)

library(plyr)
library(foreach)
n <- 10000
df <- data.frame(P   = rnorm(n, mean=100, sd=10),
                 B0  = rnorm(n, mean=40,  sd=5),
                 CF1 = rnorm(n, mean=30,  sd=10),
                 CF2 = rnorm(n, mean=30,  sd=5),
                 CF3 = rnorm(n, mean=90,  sd=8))

get_uniroot <- function(P, B0, CF1, CF2, CF3) {

  uniroot(function(x) {-P + B0 + CF1/x + CF2/x^2 + CF3/x^3}, 
          lower = 1,
          upper = 10,
          tol   = 0.00001)$root

}

system.time(x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3))
   #user  system elapsed 
   #0.91    0.00    0.90 
system.time(x2 <- mdply(df, get_uniroot))
   #user  system elapsed 
   #5.85    0.00    5.85
system.time(x3 <- foreach(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3, .combine = "c") %do% {
    get_uniroot(P, B0, CF1, CF2, CF3)})
   #user  system elapsed 
  # 10.30    0.00   10.36
all.equal(x1, x2$V1) #TRUE
all.equal(x1, x3)    #TRUE

此外,我尝试从上面的 SO 链接中实现 Ryan Thompson 的函数 chunkapply(只去掉了 doMC 部分,因为我无法安装它.尽管如此,即使在调整之后,他的示例仍然有效他的功能.),但没有让它工作.但是,由于它使用了foreach,我认为上面提到的参数同样适用,所以我没有尝试太久.

Also, I tried to implement Ryan Thompson's function chunkapply from the SO link above (only got rid of doMC part, because I couldn't install it. His example works, though, even after adjusting his function.), but didn't get it to work. However, since it uses foreach, I thought the same arguments mentioned above apply, so I didn't try it too long.

#chunkapply(get_uniroot, list(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3))
#Error in { : task 1 failed - "invalid function value in 'zeroin'"

PS:我知道我可以增加 tol 以减少查找 uniroot 所需的步骤数.但是,我已经将 tol 设置得尽可能大.

PS: I know that I could just increase tol to reduce the number of steps that are necessary to find a uniroot. However, I already set tol as big as possible.

推荐答案

我会使用 R 2.14 内置的 parallel 包来处理矩阵.然后你可以像这样简单地使用 mclapply:

I'd use the parallel package that's built into R 2.14 and work with matrices. You could then simply use mclapply like this:

dfm <- as.matrix(df)
result <- mclapply(seq_len(nrow(dfm)),
          function(x) do.call(get_uniroot,as.list(dfm[x,])),
          mc.cores=4L
          )
unlist(result)

这基本上与 mapply 相同,但以并行方式进行.

This is basically doing the same mapply does, but in a parallel way.

但是...

请注意,并行化也总是计入一些开销.正如我在您链接到的问题中所解释的那样,只有当您的内部函数的计算时间明显长于所涉及的开销时,并行才会有回报.在您的情况下,您的 uniroot 功能运行得非常快.然后,您可能会考虑将数据帧切成更大的块,并将 mapply 和 mclapply 结合起来.一种可能的方法是:

Mind you that parallelization always counts for some overhead as well. As I explained in the question you link to, going parallel only pays off if your inner function calculates significantly longer than the overhead involved. In your case, your uniroot function works pretty fast. You might then consider to cut your data frame in bigger chunks, and combine both mapply and mclapply. A possible way to do this is:

ncores <- 4
id <- floor(
        quantile(0:nrow(df),
                 1-(0:ncores)/ncores
        )
      )
idm <- embed(id,2)

mapply_uniroot <- function(id){
  tmp <- df[(id[1]+1):id[2],]
  mapply(get_uniroot, tmp$P, tmp$B0, tmp$CF1, tmp$CF2, tmp$CF3)
}
result <-mclapply(nrow(idm):1,
                  function(x) mapply_uniroot(idm[x,]),
                  mc.cores=ncores)
final <- unlist(result)

这可能需要一些调整,但它本质上将您的 df 分解为与核心数量一样多的位,并在每个核心上运行 mapply.为了展示这个作品:

This might need some tweaking, but it essentially breaks your df in exactly as many bits as there are cores, and run the mapply on every core. To show this works :

> x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3)
> all.equal(final,x1)
[1] TRUE

这篇关于有没有一种有效的方法来并行化 mapply?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆