有没有办法摆脱foreach循环? [英] Is there any way to break out of a foreach loop?
问题描述
我使用R包 foreach()
和%dopar%
进行长时间(〜天)计算在平行下。我希望能够在其中一个产生错误的情况下停止整套计算。但是,我还没有找到办法来实现这一点,从文档和各种论坛,我没有发现这是可能的。特别是 break()
不起作用, stop()
只会停止当前的计算,而不是整个 foreach
循环。
请注意,我不能使用简单的for循环,因为最终我想使用doRNG程序包。
下面是我尝试的一个简化的,可重复的版本(在这里显示的是与%do%
,但是在使用 doRNG
和%dopar%
)时遇到同样的问题。请注意,实际上我想要并行运行这个循环的所有元素(这里是10)。
library(foreach)
myfunc< - function(){
x< - foreach(k = 1:10,.combine =cbind,.errorhandling =stop)%do%{
cat (Element,k,\ n)
Sys.sleep(0.5)#只是为了显示停止不会导致从foreach退出
if(is.element(k,2:6 )){
cat(Should stop \\\
)
stop(Has stopped)
}
k
}
return(x)
x< - myfunc()
#stop()暂停k = 2:6的处理,但是它不会停止foreach循环本身。
#x不被返回。执行过程会在{:任务2失败 - 已停止
我想实现的是整个foreach循环可以在某些情况下立即退出(在这里,遇到 stop()
)。
我发现没有办法用 foreach
来实现。看来我需要一种方法来发送消息到所有其他进程,使他们也停止。
如果不可能与 foreach
,有没有人知道的替代品?我也尝试过使用 parallel :: mclapply
来达到这个效果,但是这也不行。
> sessionInfo()
R版本3.0.0(2013-04-03)
平台:x86_64-apple-darwin10.8.0(64位)
语言环境:
[1] C / UTF-8 / C / C / C / C
附加的基本软件包:
[1] stats graphics grDevices utils datasets methods base
其他附加的软件包:
[1] foreach_1.4.0
通过命名空间加载(而不是附加):
[1] codetools_0.2-8 compiler_3.0.0 iterators_1。 0.6
听起来像是你想要一个不耐烦版本的停止错误处理。你可以通过编写一个自定义的组合函数来实现这个功能,并且在返回每一个结果后立即调用它来安排 foreach
。要做到这一点,您需要:
- 使用支持调用
combine
的后端例如doMPI
或doRedis
- 请勿启用
.multicombine
- 将
.inorder
设置为FALSE
- 将
.init
设置为某物(例如NULL
下面是一个例子: >
library(foreach)
parfun< - function(errval,n){
abortable< - function(errfun){
comb < - function(x ,y){
if(inherits(y,'error')){
warning('这会让你的并行后端处于不一致的状态')
errfun(y)
}
c(x,y)
}
foreach(i = seq_len(n),.errorhandling ='pass',.export ='errval',
.combine = 'comb',.inorder = FALSE,.init = NULL)%dopar%{
if(i == errval)
stop('tes )
Sys.sleep(10)
i
}
}
callCC(abortable)
}
$ c $请注意,我也设置错误处理为传递,所以foreach
将调用将函数与错误对象组合在一起。不管在中使用的错误处理如何,
和后端。在这种情况下,callCC
函数用于从foreach
foreachcallCC
会调用abortable
函数,传递一个被使用的函数对象forcecallCC
立即返回。通过从组合函数中调用该函数,当我们检测到一个错误对象时,我们可以从foreach
循环中进行转义,并且使用callCC
返回该对象。你可以使用> parfun
没有注册并行后端,一旦执行一个抛出错误的任务,确认foreach
循环中断,但是这可能需要一段时间按顺序执行。例如,如果没有后端注册,则需要20秒才能执行:
print(system.time(parfun(3,4) )))
并行执行
parfun
,我们需要做的不仅仅是简单地分解foreach
循环:我们还需要停止工作,否则他们将继续计算他们分配的任务。使用doMPI
,工作人员可以使用mpi.abort
停止工作:library(doMPI)
cl < - startMPIcluster()
registerDoMPI(cl)
r < - parfun(getDoParWorkers(),getDoParWorkers ())
if(inherits(r,'error')){
cat(sprintf('Caught error:%s\',conditionMessage(r)))
mpi。 abort(cl $ comm)
}
请注意,集群对象不能使用循环中止后,因为事情没有被正确清理,这就是为什么正常的停止错误处理不能这样工作。
I am using the R package
foreach()
with%dopar%
to do long (~days) calculations in parallel. I would like the ability to stop the entire set of calculations in the event that one of them produces an error. However, I have not found a way to achieve this, and from the documentation and various forums I have found no indication that this is possible. In particular,break()
does not work andstop()
only stops the current calculation, not the wholeforeach
loop.Note that I cannot use a simple for loop, because ultimately I want to parallelize this using the doRNG package.
Here is a simplified, reproducible version of what I am attempting (shown here in serial with
%do%
, but I have the same problem when usingdoRNG
and%dopar%
). Note that in reality I want to run all of the elements of this loop (here 10) in parallel.library(foreach) myfunc <- function() { x <- foreach(k = 1:10, .combine="cbind", .errorhandling="stop") %do% { cat("Element ", k, "\n") Sys.sleep(0.5) # just to show that stop does not cause exit from foreach if(is.element(k, 2:6)) { cat("Should stop\n") stop("Has stopped") } k } return(x) } x <- myfunc() # stop() halts the processing of k=2:6, but it does not stop the foreach loop itself. # x is not returned. The execution produces the error message # Error in { : task 2 failed - "Has stopped"
What I would like to achieve is that the entire foreach loop can be exited immediately upon some condition (here, when the
stop()
is encountered).I have found no way to achieve this with
foreach
. It seems that I would need a way to send a message to all the other processes to make them stop too.If not possible with
foreach
, does anyone know of alternatives? I have also tried to achieve this withparallel::mclapply
, but that does not work either.> sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] C/UTF-8/C/C/C/C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] foreach_1.4.0 loaded via a namespace (and not attached): [1] codetools_0.2-8 compiler_3.0.0 iterators_1.0.6
解决方案It sounds like you want an impatient version of the "stop" error handling. You could implement that by writing a custom combine function, and arranging for
foreach
to call it as soon as each result is returned. To do that you need to:- Use a backend that supports calling
combine
on-the-fly, likedoMPI
ordoRedis
- Don't enable
.multicombine
- Set
.inorder
toFALSE
- Set
.init
to something (likeNULL
)
Here's an example that does that:
library(foreach) parfun <- function(errval, n) { abortable <- function(errfun) { comb <- function(x, y) { if (inherits(y, 'error')) { warning('This will leave your parallel backend in an inconsistent state') errfun(y) } c(x, y) } foreach(i=seq_len(n), .errorhandling='pass', .export='errval', .combine='comb', .inorder=FALSE, .init=NULL) %dopar% { if (i == errval) stop('testing abort') Sys.sleep(10) i } } callCC(abortable) }
Note that I also set the error handling to "pass" so
foreach
will call the combine function with an error object. ThecallCC
function is used to return from theforeach
loop regardless of the error handling used withinforeach
and the backend. In this case,callCC
will call theabortable
function, passing it a function object that is used forcecallCC
to immediately return. By calling that function from the combine function we can escape from theforeach
loop when we detect an error object, and havecallCC
return that object. See?callCC
for more information.You can actually use
parfun
without a parallel backend registered and verify that theforeach
loop "breaks" as soon as it executes a task that throws an error, but that could take awhile since the tasks are executed sequentially. For example, this takes 20 seconds to execute if no backend is registered:print(system.time(parfun(3, 4)))
When executing
parfun
in parallel, we need to do more than simply break out of theforeach
loop: we also need to stop the workers, otherwise they will continue to compute their assigned tasks. WithdoMPI
, the workers can be stopped usingmpi.abort
:library(doMPI) cl <- startMPIcluster() registerDoMPI(cl) r <- parfun(getDoParWorkers(), getDoParWorkers()) if (inherits(r, 'error')) { cat(sprintf('Caught error: %s\n', conditionMessage(r))) mpi.abort(cl$comm) }
Note that the cluster object can't be used after the loop aborts, because things weren't properly cleaned up, which is why the normal "stop" error handling doesn't work this way.
这篇关于有没有办法摆脱foreach循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- Use a backend that supports calling