mclapply随机返回NULL [英] mclapply returns NULL randomly

查看:120
本文介绍了mclapply随机返回NULL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用mclapply时,有时(确实是随机地)会给出错误的结果.该问题在Internet上的其他帖子中已进行了相当详尽的描述,例如( http://r.789695.n4.nabble.com/Bug-in-mclapply -td4652743.html ).但是,没有提供解决方案.有谁知道如何解决这个问题?谢谢!

When I am using mclapply, from time to time (really randomly) it gives incorrect results. The problem is quite thoroughly described in other posts across the Internet, e.g. (http://r.789695.n4.nabble.com/Bug-in-mclapply-td4652743.html). However, no solution is provided. Does anyone know how to fix this problem? Thank you!

推荐答案

您引用的Winston Chang报告的问题似乎已在R 2.15.3中修复.将工作人员结果分配到结果列表时,在mccollect中发生了一个错误:

The problem reported by Winston Chang that you cite appears to have been fixed in R 2.15.3. There was a bug in mccollect that occurred when assigning the worker results to the result list:

if (is.raw(r)) res[[which(pid == pids)]] <- unserialize(r)

如果unserialize(r)返回NULL,则此操作失败,因为以这种方式为列表分配NULL会删除列表的相应元素.在R 2.15.3中将其更改为:

This fails if unserialize(r) returns a NULL, since assigning a NULL to a list in this way deletes the corresponding element of the list. This was changed in R 2.15.3 to:

if (is.raw(r)) # unserialize(r) might be null
    res[which(pid == pids)] <- list(unserialize(r))

这是将未知值分配给列表的安全方法.

which is a safe way to assign an unknown value to a list.

因此,如果您使用R <= 2.15.2,则解决方案是升级到R> = 2.15.3.如果使用R> = 2.15.3时遇到问题,那么大概是Winston Chang报告的问题与之不同.

So if you're using R <= 2.15.2, the solution is to upgrade to R >= 2.15.3. If you have a problem using R >= 2.15.3, then presumably it's a different problem then the one reported by Winston Chang.

我还阅读了由Elizabeth Purdom启动的R-help线程中讨论的问题.没有特定的测试用例,我的猜测是问题不是 ,这是由于mclapply中的错误所致,因为我可以使用以下功能重现相同的症状:

I also read over the issues discussed in the R-help thread started by Elizabeth Purdom. Without a specific test case, my guess is that the problem is not due to a bug in mclapply because I can reproduce the same symptoms with the following function:

work <- function(i, poison) {
  if (i == poison) quit(save='no')
  i
}

如果由mclapply开始的工作人员由于任何原因在执行任务时死亡(接收到信号,seg错误,退出),则mclapply将为分配给该工作人员的所有任务返回NULL:

If a worker started by mclapply dies while executing a task for any reason (receiving a signal, seg faulting, exiting), mclapply will return a NULL for all of the tasks that were assigned to that worker:

> library(parallel)
> mclapply(1:4, work, 3, mc.cores=2)
[[1]]
NULL

[[2]]
[1] 2

[[3]]
NULL

[[4]]
[1] 4

在这种情况下,即使只有任务3实际失败,由于预定好的情况,任务1和3仍返回NULL.

In this case, NULL's were returned for tasks 1 and 3 due to prescheduling, even though only task 3 actually failed.

如果工人在使用诸如parLapply或clusterApply之类的功能时死亡,则会报告错误:

If a worker dies when using a function such as parLapply or clusterApply, an error is reported:

> cl <- makePSOCKcluster(3)
> parLapply(cl, 1:4, work, 3)
Error in unserialize(node$con) : error reading from connection

我已经看到了许多这样的报告,并且我认为它们倾向于在大型程序中发生,这些程序使用大量难以转换为可重现的测试用例的程序包.

I've seen many such reports, and I think they tend to happen in large programs that use lots of packages that are hard to turn into reproducible test cases.

当然,在此示例中,使用lapply时也会出现错误,尽管该错误不会像mclapply那样被隐藏.如果使用lapply时问题似乎没有发生,则可能是因为该问题很少发生,所以它仅在使用mclapply并行执行的非常大的运行中才会发生.但是也有可能发生错误,这不是因为任务是并行执行的,而是因为它们是由分叉的进程执行的.例如,在分叉的过程中执行各种图形操作将失败.

Of course, in this example, you'll also get an error when using lapply, although the error won't be hidden as it is with mclapply. If the problem doesn't seem to happen when using lapply, it may be because the problem rarely occurs, so it only happens in very large runs that are executed in parallel using mclapply. But it is also possible that the error occurs, not because the tasks are executed in parallel, but because they are executed by forked processes. For example, various graphics operations will fail when executed in a forked process.

这篇关于mclapply随机返回NULL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆