有没有办法跳出 foreach 循环? [英] Is there any way to break out of a foreach loop?

查看:30
本文介绍了有没有办法跳出 foreach 循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 R 包 foreach()%dopar% 来并行执行长时间(~天)计算.我希望能够在其中一个产生错误的情况下停止整个计算集.但是,我还没有找到实现此目的的方法,并且从文档和各种论坛中我没有发现任何迹象表明这是可能的.特别是,break() 不起作用,stop() 只停止当前计算,而不是整个 foreach 循环.

I am using the R package foreach() with %dopar% to do long (~days) calculations in parallel. I would like the ability to stop the entire set of calculations in the event that one of them produces an error. However, I have not found a way to achieve this, and from the documentation and various forums I have found no indication that this is possible. In particular, break() does not work and stop() only stops the current calculation, not the whole foreach loop.

请注意,我不能使用简单的 for 循环,因为最终我想使用 doRNG 包对其进行并行化.

Note that I cannot use a simple for loop, because ultimately I want to parallelize this using the doRNG package.

这是我正在尝试的简化、可重现的版本(此处与 %do% 串行显示,但在使用 doRNG 时我遇到了同样的问题代码>%dopar%).请注意,实际上我想并行运行此循环的所有元素(此处为 10 个).

Here is a simplified, reproducible version of what I am attempting (shown here in serial with %do%, but I have the same problem when using doRNG and %dopar%). Note that in reality I want to run all of the elements of this loop (here 10) in parallel.

library(foreach)
myfunc <- function() {
  x <- foreach(k = 1:10, .combine="cbind", .errorhandling="stop") %do% {
    cat("Element ", k, "
")
    Sys.sleep(0.5) # just to show that stop does not cause exit from foreach
    if(is.element(k, 2:6)) {
      cat("Should stop
")
      stop("Has stopped")
    }
    k
  }
  return(x)
}
x <- myfunc()
# stop() halts the processing of k=2:6, but it does not stop the foreach loop itself.
# x is not returned. The execution produces the error message
# Error in { : task 2 failed - "Has stopped"

我想要实现的是整个 foreach 循环可以在某些条件下立即退出(这里,当遇到 stop() 时).

What I would like to achieve is that the entire foreach loop can be exited immediately upon some condition (here, when the stop() is encountered).

我发现无法通过 foreach 实现这一点.看来我需要一种方法来向所有其他进程发送消息以让它们也停止.

I have found no way to achieve this with foreach. It seems that I would need a way to send a message to all the other processes to make them stop too.

如果 foreach 无法实现,有人知道替代方案吗?我也尝试使用 parallel::mclapply 来实现这一点,但这也不起作用.

If not possible with foreach, does anyone know of alternatives? I have also tried to achieve this with parallel::mclapply, but that does not work either.

> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] C/UTF-8/C/C/C/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base

other attached packages:
[1] foreach_1.4.0

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_3.0.0  iterators_1.0.6

推荐答案

听起来您想要一个不耐烦版本的停止"错误处理.您可以通过编写自定义组合函数并安排 foreach 在每个结果返回后立即调用它来实现这一点.为此,您需要:

It sounds like you want an impatient version of the "stop" error handling. You could implement that by writing a custom combine function, and arranging for foreach to call it as soon as each result is returned. To do that you need to:

  • 使用支持动态调用 combine 的后端,例如 doMPIdoRedis
  • 不要启用.multicombine
  • .inorder 设置为 FALSE
  • .init 设置为某些内容(例如 NULL)
  • Use a backend that supports calling combine on-the-fly, like doMPI or doRedis
  • Don't enable .multicombine
  • Set .inorder to FALSE
  • Set .init to something (like NULL)

这是一个例子:

library(foreach)
parfun <- function(errval, n) {
  abortable <- function(errfun) {
    comb <- function(x, y) {
      if (inherits(y, 'error')) {
        warning('This will leave your parallel backend in an inconsistent state')
        errfun(y)
      }
      c(x, y)
    }
    foreach(i=seq_len(n), .errorhandling='pass', .export='errval',
            .combine='comb', .inorder=FALSE, .init=NULL) %dopar% {
      if (i == errval)
        stop('testing abort')
      Sys.sleep(10)
      i
    }
  }
  callCC(abortable)
}

请注意,我还将错误处理设置为pass",因此 foreach 将使用错误对象调用组合函数.callCC 函数用于从 foreach 循环返回,而不管 foreach 和后端中使用的错误处理.在这种情况下,callCC 将调用 abortable 函数,传递给它一个用于强制 callCC 立即返回的函数对象.通过从 combine 函数调用该函数,我们可以在检测到错误对象时从 foreach 循环中退出,并让 callCC 返回该对象.有关详细信息,请参阅 ?callCC.

Note that I also set the error handling to "pass" so foreach will call the combine function with an error object. The callCC function is used to return from the foreach loop regardless of the error handling used within foreach and the backend. In this case, callCC will call the abortable function, passing it a function object that is used force callCC to immediately return. By calling that function from the combine function we can escape from the foreach loop when we detect an error object, and have callCC return that object. See ?callCC for more information.

您实际上可以在没有注册并行后端的情况下使用 parfun 并验证 foreach 循环在执行引发错误的任务时立即中断",但是由于任务是按顺序执行的,因此可能需要一段时间.例如,如果没有注册后端,这需要 20 秒才能执行:

You can actually use parfun without a parallel backend registered and verify that the foreach loop "breaks" as soon as it executes a task that throws an error, but that could take awhile since the tasks are executed sequentially. For example, this takes 20 seconds to execute if no backend is registered:

print(system.time(parfun(3, 4)))

当并行执行parfun时,我们需要做的不仅仅是跳出foreach循环:我们还需要停止worker,否则他们将继续计算他们分配的任务.使用 doMPI,可以使用 mpi.abort 停止工作进程:

When executing parfun in parallel, we need to do more than simply break out of the foreach loop: we also need to stop the workers, otherwise they will continue to compute their assigned tasks. With doMPI, the workers can be stopped using mpi.abort:

library(doMPI)
cl <- startMPIcluster()
registerDoMPI(cl)
r <- parfun(getDoParWorkers(), getDoParWorkers())
if (inherits(r, 'error')) {
  cat(sprintf('Caught error: %s
', conditionMessage(r)))
  mpi.abort(cl$comm)
}

请注意,在循环中止后不能使用集群对象,因为事情没有被正确清理,这就是正常的停止"错误处理不能以这种方式工作的原因.

Note that the cluster object can't be used after the loop aborts, because things weren't properly cleaned up, which is why the normal "stop" error handling doesn't work this way.

这篇关于有没有办法跳出 foreach 循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆