工人结果未正确返回-下雪-调试 [英] Results of workers not returned properly - snow - debug
问题描述
我正在使用R中的snow
包在具有运行Linux OS的多台计算机(3)的SOCK
集群上执行功能.我尝试同时使用parLapply
和clusterApply
运行代码.
I'm using the snow
package in R to execute a function on a SOCK
cluster with multiple machines(3) running on Linux OS. I tried to run the code with both parLapply
and clusterApply
.
在工作程序级别发生任何错误的情况下,工作程序节点的结果未正确返回给主服务器,这使得调试非常困难.我目前正在使用futile.logger
独立记录工作者节点的每个心跳.似乎结果得到了正确的计算.但是,当我尝试在主节点上打印结果时(接收到worker的输出之后),我收到一条错误消息Error in checkForRemoteErrors(val): 8 nodes produced errors; first error: missing value where TRUE/FALSE needed
.
In case of any error at the worker level, the results of the worker nodes are not returned properly to master making it very hard to debug. I'm currently logging every heartbeat of the worker nodes independently using futile.logger
. It seems as if the results are properly computed. But when I tried to print the result at the master node (After receiving the output from workers) I get an error which says, Error in checkForRemoteErrors(val): 8 nodes produced errors; first error: missing value where TRUE/FALSE needed
.
是否有任何方法可以更深入地调试工作人员的结果?
Is there any way to debug the results of the workers more deeply?
推荐答案
parLapply
和clusterApply
调用checkForRemoteErrors
函数以检查任务错误,如果任何任务失败,它将引发错误. .不幸的是,尽管它显示错误消息,但是它没有提供任何有关导致此错误的工作程序代码的信息.但是,如果您修改工作人员/任务功能以捕获错误,则可以保留一些额外的信息,这些信息可能有助于确定错误发生的位置.
The checkForRemoteErrors
function is called by parLapply
and clusterApply
to check for task errors, and it will throw an error if any of the tasks failed. Unfortunately, although it displays the error message, it doesn't provide any information about what worker code caused the error. But if you modify your worker/task function to catch errors, you can retain some extra information that may be helpful in determining where the error occurred.
例如,这是一个失败的简单雪花程序.请注意,它在创建集群时使用outfile=''
,以便显示程序的输出,这本身就是一种非常有用的调试技术:
For example, here's a simple snow program that fails. Note that it uses outfile=''
when creating the cluster so that output from the program is displayed, which by itself is a very useful debugging technique:
library(snow)
cl <- makeSOCKcluster(2, outfile='')
problem <- function(i) {
if (NA)
j <- 999
else
j <- i
2 * j
}
r <- parLapply(cl, 1:2, problem)
执行此操作时,您会看到来自checkForRemoteErrors
的错误消息和一些其他消息,但是没有任何东西可以告诉您if
语句引起了错误.为了在调用problem
时捕获错误,我们定义workerfun
:
When you execute this, you see the error message from checkForRemoteErrors
and some other messages, but nothing that tells you that the if
statement caused the error. To catch errors when calling problem
, we define workerfun
:
workerfun <- function(i) {
tryCatch({
problem(i)
},
error=function(e) {
print(e)
stop(e)
})
}
现在我们用parLapply
而不是problem
执行workerfun
,首先将problem
导出到工人:
Now we execute workerfun
with parLapply
instead of problem
, first exporting problem
to the workers:
clusterExport(cl, c('problem'))
r <- parLapply(cl, 1:2, workerfun)
在其他消息中,我们现在看到
Among the other messages, we now see
<simpleError in if (NA) j <- 999 else j <- i: missing value where TRUE/FALSE needed>
,其中包括生成错误的实际if
语句.当然,它不会告诉您表达式的文件名和行号,但通常足以让您解决问题.
which includes the actual if
statement that generated the error. Of course, it doesn't tell you the file name and line number of the expression, but it's often enough to let you solve the problem.
这篇关于工人结果未正确返回-下雪-调试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!