有没有一种有效的方法来并行化mapply? [英] Is there an efficient way to parallelize mapply?

查看:85
本文介绍了有没有一种有效的方法来并行化mapply?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多行,每行我都会计算一个非线性函数的单根.我有一台四核Ubuntu计算机,现在已经两天没有停止运行我的代码了.毫不奇怪,我正在寻找加快速度的方法;-)

I have many rows and on every row I compute the uniroot of a non-linear function. I have a quad-core Ubuntu machine which hasn't stopped running my code for two days now. Not surprisingly, I'm looking for ways to speed things up ;-)

经过一些研究,我注意到当前仅使用一个内核,而并行化是要做的事情.深入研究,我得出结论(也许是错误的?),包foreach并不是真的要解决我的问题,因为会产生过多的开销(例如,参见

After some research, I noticed that only one core is currently used and parallelization is the thing to do. Digging deeper, I came to the conclusion (maybe incorrectly?) that the package foreach isn't really meant for my problem because too much overhead is produced (see, for example, SO). A good alternative seems to be multicore for Unix machines. In particular, the pvec function seems to be the most efficient one after I checked the help page.

但是,如果我理解正确,则此函数仅采用一个向量并将其相应地拆分.我需要一个可以并行化的函数,但是要使用多个向量(或者使用data.frame),就像mapply函数一样.有什么我想念的吗?

However, if I understand it correctly, this function only takes one vector and splits it up accordingly. I need a function that can be parallized, but takes multiple vectors (or a data.frame instead), just like the mapply function does. Is there anything out there that I missed?

这是我要执行的操作的一个小示例:(请注意,我在此处包括一个plyr示例,因为它可以替代基本mapply函数,并且具有并行化选项.但是,它是在我的执行速度和内部执行速度较慢的情况下,它会调用foreach进行并行化,因此我认为这没有帮助.对吗?

Here is a small example of what I want to do: (Note that I include a plyr example here because it can be an alternative to the base mapply function and it has a parallelize option. However, it is slower in my implementation and internally, it calls foreach to parallelize, so I think it won't help. Is that correct?)

library(plyr)
library(foreach)
n <- 10000
df <- data.frame(P   = rnorm(n, mean=100, sd=10),
                 B0  = rnorm(n, mean=40,  sd=5),
                 CF1 = rnorm(n, mean=30,  sd=10),
                 CF2 = rnorm(n, mean=30,  sd=5),
                 CF3 = rnorm(n, mean=90,  sd=8))

get_uniroot <- function(P, B0, CF1, CF2, CF3) {

  uniroot(function(x) {-P + B0 + CF1/x + CF2/x^2 + CF3/x^3}, 
          lower = 1,
          upper = 10,
          tol   = 0.00001)$root

}

system.time(x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3))
   #user  system elapsed 
   #0.91    0.00    0.90 
system.time(x2 <- mdply(df, get_uniroot))
   #user  system elapsed 
   #5.85    0.00    5.85
system.time(x3 <- foreach(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3, .combine = "c") %do% {
    get_uniroot(P, B0, CF1, CF2, CF3)})
   #user  system elapsed 
  # 10.30    0.00   10.36
all.equal(x1, x2$V1) #TRUE
all.equal(x1, x3)    #TRUE

此外,我尝试通过上面的SO链接来实现Ryan Thompson的功能chunkapply(仅删除了doMC部分,因为我无法安装它.尽管调整了他的功能,但他的示例仍然有效), 但没有成功.但是,由于它使用foreach,因此我认为适用上述相同的参数,因此我没有尝试太久.

Also, I tried to implement Ryan Thompson's function chunkapply from the SO link above (only got rid of doMC part, because I couldn't install it. His example works, though, even after adjusting his function.), but didn't get it to work. However, since it uses foreach, I thought the same arguments mentioned above apply, so I didn't try it too long.

#chunkapply(get_uniroot, list(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3))
#Error in { : task 1 failed - "invalid function value in 'zeroin'"

PS:我知道我可以增加tol来减少查找单根目录所需的步骤数.但是,我已经将tol设置得尽可能大.

PS: I know that I could just increase tol to reduce the number of steps that are necessary to find a uniroot. However, I already set tol as big as possible.

推荐答案

我将使用R 2.14内置的parallel程序包,并使用矩阵.然后,您可以像这样简单地使用mclapply:

I'd use the parallel package that's built into R 2.14 and work with matrices. You could then simply use mclapply like this:

dfm <- as.matrix(df)
result <- mclapply(seq_len(nrow(dfm)),
          function(x) do.call(get_uniroot,as.list(dfm[x,])),
          mc.cores=4L
          )
unlist(result)

这基本上与mapply相同,但是以并行方式进行.

This is basically doing the same mapply does, but in a parallel way.

但是...

请记住,并行化也同样会带来一些开销.正如我在您链接到的问题中所解释的那样,只有当您的内部函数计算出的时间长于所涉及的开销时,并行化才会有回报.就您而言,您的uniroot函数运行得非常快.然后,您可能会考虑将数据帧切成更大的块,并同时合并mapply和mclapply.一种可能的方法是:

Mind you that parallelization always counts for some overhead as well. As I explained in the question you link to, going parallel only pays off if your inner function calculates significantly longer than the overhead involved. In your case, your uniroot function works pretty fast. You might then consider to cut your data frame in bigger chunks, and combine both mapply and mclapply. A possible way to do this is:

ncores <- 4
id <- floor(
        quantile(0:nrow(df),
                 1-(0:ncores)/ncores
        )
      )
idm <- embed(id,2)

mapply_uniroot <- function(id){
  tmp <- df[(id[1]+1):id[2],]
  mapply(get_uniroot, tmp$P, tmp$B0, tmp$CF1, tmp$CF2, tmp$CF3)
}
result <-mclapply(nrow(idm):1,
                  function(x) mapply_uniroot(idm[x,]),
                  mc.cores=ncores)
final <- unlist(result)

这可能需要进行一些调整,但实际上,它会将df的位数完全破坏为与内核数相同的位数,并在每个内核上运行mapply.为了展示这个作品:

This might need some tweaking, but it essentially breaks your df in exactly as many bits as there are cores, and run the mapply on every core. To show this works :

> x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3)
> all.equal(final,x1)
[1] TRUE

这篇关于有没有一种有效的方法来并行化mapply?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆