mclapply 随机返回 NULL [英] mclapply returns NULL randomly

查看:21
本文介绍了mclapply 随机返回 NULL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用 mclapply 时,有时(非常随机)它会给出不正确的结果.该问题在 Internet 上的其他帖子中得到了非常详尽的描述,例如(http://r.789695.n4.nabble.com/Bug-in-mclapply-td4652743.html).但是,没有提供解决方案.有谁知道如何解决这个问题?谢谢!

When I am using mclapply, from time to time (really randomly) it gives incorrect results. The problem is quite thoroughly described in other posts across the Internet, e.g. (http://r.789695.n4.nabble.com/Bug-in-mclapply-td4652743.html). However, no solution is provided. Does anyone know how to fix this problem? Thank you!

推荐答案

您引用的 Winston Chang 报告的问题似乎已在 R 2.15.3 中修复.mccollect 中存在一个错误,在将工作人员结果分配到结果列表时发生:

The problem reported by Winston Chang that you cite appears to have been fixed in R 2.15.3. There was a bug in mccollect that occurred when assigning the worker results to the result list:

if (is.raw(r)) res[[which(pid == pids)]] <- unserialize(r)

如果 unserialize(r) 返回 NULL,则失败,因为以这种方式将 NULL 分配给列表会删除列表的相应元素.这在 R 2.15.3 中更改为:

This fails if unserialize(r) returns a NULL, since assigning a NULL to a list in this way deletes the corresponding element of the list. This was changed in R 2.15.3 to:

if (is.raw(r)) # unserialize(r) might be null
    res[which(pid == pids)] <- list(unserialize(r))

这是一种将未知值分配给列表的安全方法.

which is a safe way to assign an unknown value to a list.

因此,如果您使用 R <= 2.15.2,解决方案是升级到 R >= 2.15.3.如果您在使用 R >= 2.15.3 时遇到问题,那么大概是与 Winston Chang 报告的问题不同的问题.

So if you're using R <= 2.15.2, the solution is to upgrade to R >= 2.15.3. If you have a problem using R >= 2.15.3, then presumably it's a different problem then the one reported by Winston Chang.

我还阅读了由 Elizabeth Purdom 发起的 R-help 主题中讨论的问题.如果没有特定的测试用例,我的猜测是问题不是由于 mclapply 中的错误,因为我可以使用以下函数重现相同的症状:

I also read over the issues discussed in the R-help thread started by Elizabeth Purdom. Without a specific test case, my guess is that the problem is not due to a bug in mclapply because I can reproduce the same symptoms with the following function:

work <- function(i, poison) {
  if (i == poison) quit(save='no')
  i
}

如果由 mclapply 启动的 worker 在执行任务时因任何原因(接收信号、段错误、退出)而死亡,mclapply 将为分配给该 worker 的所有任务返回 NULL:

If a worker started by mclapply dies while executing a task for any reason (receiving a signal, seg faulting, exiting), mclapply will return a NULL for all of the tasks that were assigned to that worker:

> library(parallel)
> mclapply(1:4, work, 3, mc.cores=2)
[[1]]
NULL

[[2]]
[1] 2

[[3]]
NULL

[[4]]
[1] 4

在这种情况下,由于预调度,任务 1 和 3 返回 NULL,即使只有任务 3 实际失败.

In this case, NULL's were returned for tasks 1 and 3 due to prescheduling, even though only task 3 actually failed.

如果一个worker在使用parLapply或者clusterApply等函数的时候死掉了,会报错:

If a worker dies when using a function such as parLapply or clusterApply, an error is reported:

> cl <- makePSOCKcluster(3)
> parLapply(cl, 1:4, work, 3)
Error in unserialize(node$con) : error reading from connection

我见过很多这样的报告,我认为它们往往发生在使用大量包的大型程序中,这些包很难变成可重现的测试用例.

I've seen many such reports, and I think they tend to happen in large programs that use lots of packages that are hard to turn into reproducible test cases.

当然,在本例中,使用 lapply 时也会出现错误,尽管该错误不会像使用 mclapply 那样隐藏.如果问题在使用 lapply 时似乎没有发生,可能是因为问题很少发生,所以它只发生在使用 mclapply 并行执行的非常大的运行中.但也有可能出现错误,不是因为任务是并行执行的,而是因为它们是由分叉的进程执行的.比如各种图形操作在fork进程中执行都会失败.

Of course, in this example, you'll also get an error when using lapply, although the error won't be hidden as it is with mclapply. If the problem doesn't seem to happen when using lapply, it may be because the problem rarely occurs, so it only happens in very large runs that are executed in parallel using mclapply. But it is also possible that the error occurs, not because the tasks are executed in parallel, but because they are executed by forked processes. For example, various graphics operations will fail when executed in a forked process.

这篇关于mclapply 随机返回 NULL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆