parApply中的错误处理(在R中,使用并行包) [英] Error handling within parApply (in R, using parallel package)

查看:527
本文介绍了parApply中的错误处理(在R中,使用并行包)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解决从parallel包中使用parApply函数时收到的以下消息:

I am trying to troubleshoot the following message I get when trying to use the parApply function from the parallel package:

Error in unserialize(node$con) : error reading from connection

以下是我正在做的事情的一个样机:

The following is a mockup of what I'm doing:

c0<-makeCluster(16,outfile='');clusterEvalQ(c0,library(survival));
aa <- array(rexp(1e4),c(100,50,2));
bb<-parApply(c0,aa,1,function(ii) {
  oo<-try(summary(coxph(Surv(c(ii))~gl(2,50)))$coef[1,]);
  if(class(oo)[1]=='try-error') rep(NA,5) else oo
});

... 除了不会产生错误.我从parApply内部调用的实际函数是我自己写的一个巨大函数,太长了,无法尝试在此处发布.但是我并不是想找人来调试我的功能.我试图找出在哪里可以找到更详细的调试信息,以及为使try()达到其既定目的而不得不绞尽脑汁的人.

... except that it doesn't produce the error. The actual function I call from inside parApply is a huge one I wrote myself that is too long to try to post here. But I'm not trying to get someone to debug my function. I'm trying to find out where to look for more detailed debugging information and who/what I have to strangle to get try() to accomplish its stated purpose.

函数可以与标准apply()aaply(...,.parallel=FALSE)一起使用,但不能与aaply(...,parallel=TRUE)一起使用.

The function does work with standard apply() and with aaply(...,.parallel=FALSE) but not aaply(...,parallel=TRUE).

我在屏幕日志上看到的唯一内容是[c6>.(除了正常的警告消息,这些消息会伴随我使用的软件包的加载).

The only thing I see on the screen log (besides normal warning messages that accompany the loading of the packages I use) is Execution halted.

当我执行stopCluster(c0)时,会得到以下附加输出:

When I do stopCluster(c0) I get the following additional output:

Error in serialize(data, node$con) : ignoring SIGPIPE signal

有人知道其他地方吗?我在CentOS 5.4版(最终版)上运行R 2.15.1.尽管我尝试使用try()捕获错误,但是否仍会向上传播某些类型的错误?我可以设置parallel中的某些超时选项以使工作节点更加耐心吗?

Does anybody know where else to look? I am running R 2.15.1 on CentOS release 5.4 (Final). Are there types of errors that can propagate upward despite my attempt to catch them with try()? Is there maybe some timeout option in parallel I can set to make the worker nodes more patient?

首先,我开始使用makeCluster(16,outfile='',type='FORK')而不是默认的SOCK类型集群.这让事情变得更加稳定了,因为FORK克隆了整个环境,而我却没有记得手动导出每个依赖项,并且/或者因为(这里不确定)FORK不必通过回送端口发送令牌化数据?

First, I started using makeCluster(16,outfile='',type='FORK') instead of the default SOCK type cluster. This got a hell of a lot more stable, because FORK clones the entire environment without me remembering to manually export every dependency and/or because (not sure here) FORK doesn't have to send tokenized data through a loopback port?

无论如何,在某些情况下error reading from connection会回来.我被陌生的问题域和模糊的错误消息分散了注意力,却忘记了像往常一样适用于此的故障排除启发方法:

Anyway, under some circumstances the error reading from connection would come back. I got distracted by the unfamiliar problem domain and vague error messages and forgot that the same troubleshooting heuristics apply here as always:

  • 相同的数据是否总是会产生问题? 对我来说,是的,它总是发生在数据集的同一区域.
  • 重现问题所需的数据集的最低特征是什么? 对输入数据的成功细分显示出引起问题的确切列.在正常的R环境中,直接在该向量上调用目标函数也触发了问题.逐行浏览目标功能可以发现失败的地方.
  • Does the same data always produce the problem? For me, yes, and it always happened in the same region of the dataset.
  • What are the minimum features of that dataset needed to reproduce the problem? Successive subdivision of the input data revealed the exact column that caused the problem. Calling the objective function on that vector directly also triggered the problem, this time in the normal R environment. Stepping through the objective function line by line revealed where it failed.

结果如回答者所暗示的那样,try()仅捕获错误.数据类型错误或大小错误或为NULL的意外结果将直接通过try()tryCatch()传递,并导致试图将结果重新放入数组的任何操作崩溃!

Turns out, as the answerer implied, try() only catches errors. An unexpected result that's the wrong data type or the wrong size or is NULL will pass right through try() and tryCatch() and crash whatever is trying to fit the result back into an array!

感谢上帝,这不是某种疯狂的不确定性比赛条件或诸如此类的事情.哇哦感谢您的阅读,希望我的经验对其他人有帮助.

Thank god it wasn't some crazy non-deterministic race condition or something. Woot. Thanks for reading, hope my experience helps someone else.

推荐答案

使用try函数可能没有任何问题.可能是您的函数导致工作进程退出.在这种情况下,主进程将从套接字连接到该工作程序的套接字读取错误,从而导致错误消息:

There may be nothing wrong with your use of the try function. It may be that your function is causing a worker process to exit. In that case, the master process will get an error reading from the socket connection to that worker, resulting in the error message:

Error in unserialize(node$con) : error reading from connection

parApply不会捕获此错误,但会传播该错误,从而导致脚本退出并显示消息执行已停止".

parApply doesn't catch this error, but propagates it, causing your script to exit with the message "Execution halted".

我可以通过以下方式重现这种情况:

I can reproduce this scenario with:

library(parallel)
cl <- makePSOCKcluster(4)
clusterApply(cl, 1:10, function(i) {
  tryCatch({
    quit(save='no', status=1)
  },
  error=function(e) {
    NULL
  })
})

执行时,我得到输出:

Error in unserialize(node$con) : error reading from connection
Calls: clusterApply ... FUN -> recvData -> recvData.SOCKnode -> unserialize
Execution halted

不幸的是,这没有告诉我们是什么导致工作者进程退出的,但是我认为这是您应该集中精力而不是在try函数上苦苦挣扎的地方.

Unfortunately, this tells us nothing about what is causing a worker process to exit, but I think that's where you should focus your efforts, rather than struggling with the try function.

这篇关于parApply中的错误处理(在R中,使用并行包)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆