R中的并行优化 [英] Parallelized optimization in R

查看:163
本文介绍了R中的并行优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在具有8个多核处理器的linux机器上运行R,并且有一个优化问题,我想通过并行优化例程本身来加快速度.重要的是,此问题涉及(1)多个参数,以及(2)固有慢速模型运行.一个相当普遍的问题!

I'm running R on linux box that has 8 multicore processors, and have an optimization problem I'd like to speed up by parallelizing the optimization routine itself. Importantly, this problem involves (1) multiple parameters, and (2) inherently slow model runs. A fairly common problem!

有人知道在这种情况下使用并行优化器吗?

Anyone know of a parallelized optimizer for such occasions?

更具体地说,每次nlm()之类的求解器,每次算法在参数空间中迈出一步时都会运行多个模型求值(每个参数值两个),因此并行化多个模型运行的实例将大大加快这些情况下的运行速度.几个参数值是合适的.

More specifically, solvers like nlm() run multiple model evaluations (two per parameter value) each time the algorithm takes a step in parameter space, so parallelizing that instance of multiple model runs would greatly speed things up in these situations when more than a few parameter values are being fit.

使用包parallel的代码似乎可以编写为用户必须进行 minimum 代码修改才能从使用nlm()optim()的方式编写这个并行优化例程.就是说,似乎可以基本不变地重写这些例程,只是多次调用模型的步骤(在基于梯度的方法中很常见)将并行完成.

It seems like code that makes use of the package parallel could be written in a way that the user would have to do minimal code modification to move from using nlm() or optim() to this parallelized optimization routine. That is, it seems one could rewrite these routines basically with no changes, except that the step of calling the model multiple times, as is common in gradient-based methods, would be done in parallel.

理想情况下,类似nlmPara()的代码将采用类似

Ideally, something like nlmPara() would take code that looks like

fit <- nlm(MyObjFunc, params0);

,仅需进行较小的修改,例如

and require only minor modifications, e.g.,

fit <- nlmPara(MyObjFunc, params0, ncores=6);

有什么想法/建议吗?

Thoughts/suggestions?

PS:我已采取步骤来加快这些模型的运行速度,但是由于多种原因它们运行缓慢(即,我不需要加快模型运行速度的建议!;-)).

PS: I've taken steps to speed up those model runs, but they're slow for a variety of reasons (i.e. I don't need advice on speeding up the model runs! ;-) ).

推荐答案

这里是一个粗略的解决方案,至少有一些希望.非常感谢Ben Bolker指出许多/大多数优化例程都允许用户指定梯度函数.

Here is a rough solution, that at least has some promise. Big thanks to Ben Bolker for pointing out that many/most optimization routines allow user-specified gradient functions.

带有更多参数值的测试问题可能会显示出更大的改进,但是在8核计算机上,使用并行梯度函数的运行时间大约是串行版本的70%.请注意,此处使用的粗略梯度近似似乎会减慢收敛速度,因此会增加一些时间.

A test problem with more parameter values might show more significant improvements, but on an 8 core machine the run using the parallelized gradient function takes about 70% as long as the serial version. Note the crude gradient approximation used here seems to slow convergence and thus adds some time to the process.

## Set up the cluster
require("parallel");
.nlocalcores = NULL; # Default to "Cores available - 1" if NULL.
if(is.null(.nlocalcores)) { .nlocalcores = detectCores() - 1; }
if(.nlocalcores < 1) { print("Multiple cores unavailable! See code!!"); return()}
print(paste("Using ",.nlocalcores,"cores for parallelized gradient computation."))
.cl=makeCluster(.nlocalcores);
print(.cl)


# Now define a gradient function: both in serial and in parallel
mygr <- function(.params, ...) {
  dp = cbind(rep(0,length(.params)),diag(.params * 1e-8)); # TINY finite difference
  Fout = apply(dp,2, function(x) fn(.params + x,...));     # Serial 
  return((Fout[-1]-Fout[1])/diag(dp[,-1]));                # finite difference 
}

mypgr <- function(.params, ...) { # Now use the cluster 
  dp = cbind(rep(0,length(.params)),diag(.params * 1e-8));   
  Fout = parCapply(.cl, dp, function(x) fn(.params + x,...)); # Parallel 
  return((Fout[-1]-Fout[1])/diag(dp[,-1]));                  #
}


## Lets try it out!
fr <- function(x, slow=FALSE) { ## Rosenbrock Banana function from optim() documentation.
  if(slow) { Sys.sleep(0.1); }   ## Modified to be a little slow, if needed.
  x1 <- x[1]
  x2 <- x[2]
  100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}

grr <- function(x, slow=FALSE) { ## Gradient of 'fr'
  if(slow) { Sys.sleep(0.1); }   ## Modified to be a little slow, if needed.
  x1 <- x[1]
  x2 <- x[2]
  c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
    200 *      (x2 - x1 * x1))
}

## Make sure the nodes can see these functions & other objects as called by the optimizer
fn <- fr;  # A bit of a hack
clusterExport(cl, "fn");

# First, test our gradient approximation function mypgr
print( mypgr(c(-1.2,1)) - grr(c(-1.2,1)))

## Some test calls, following the examples in the optim() documentation
tic = Sys.time();
fit1 = optim(c(-1.2,1), fr, slow=FALSE);                          toc1=Sys.time()-tic
fit2 = optim(c(-1.2,1), fr, gr=grr, slow=FALSE, method="BFGS");   toc2=Sys.time()-tic-toc1
fit3 = optim(c(-1.2,1), fr, gr=mygr, slow=FALSE, method="BFGS");  toc3=Sys.time()-tic-toc1-toc2
fit4 = optim(c(-1.2,1), fr, gr=mypgr, slow=FALSE, method="BFGS"); toc4=Sys.time()-tic-toc1-toc2-toc3


## Now slow it down a bit
tic = Sys.time();
fit5 = optim(c(-1.2,1), fr, slow=TRUE);                           toc5=Sys.time()-tic
fit6 = optim(c(-1.2,1), fr, gr=grr, slow=TRUE, method="BFGS");    toc6=Sys.time()-tic-toc5
fit7 = optim(c(-1.2,1), fr, gr=mygr, slow=TRUE, method="BFGS");   toc7=Sys.time()-tic-toc5-toc6
fit8 = optim(c(-1.2,1), fr, gr=mypgr, slow=TRUE, method="BFGS");  toc8=Sys.time()-tic-toc5-toc6-toc7

print(cbind(fast=c(default=toc1,exact.gr=toc2,serial.gr=toc3,parallel.gr=toc4),
            slow=c(toc5,toc6,toc7,toc8)))

这篇关于R中的并行优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆