调用序列化R函数时出错 [英] Error calling serialize R function

查看:255
本文介绍了调用序列化R函数时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将以下软件包加载到R中:

I am loading the following packages into R:

library(foreach)
library(doParallel)
library(iterators)

我很长时间以来一直在并行化"代码,但是最近我在运行代码时遇到了INTERMITTENT停止的问题.错误是:

I "parallelize" code for a long time, but lately I am getting INTERMITTENT stops while code is running. The error is:

Error in serialize(data, node$con) : error writing to connection

我有根据的猜测是,我使用以下命令打开的连接可能已过期:

My educated guess is that maybe the connection that I open using the commands below, has expired:

## Register Cluster
##
cores<-8
cl <- makeCluster(cores)
registerDoParallel(cl)

在makeCluster手册页上,我看到默认情况下,连接仅在30天后过期!我可以设置options(error = recover)以便在代码暂停时即时检查是否打开了连接,但是我决定在之前发布此一般性问题.

Looking at makeCluster man page I see that by default the connections expires only after 30 days! I could set options(error=recover) in order to check, on the fly, if the connection is opened or not when the code halts, but I decided to post this general question before.

重要提示:

1)错误确实是间歇性的,有时我会重新运行相同的代码,但未收到任何错误. 2)我在同一台多核计算机(Intel/8核)上运行所有程序.因此,这不是集群之间的通信(网络)问题. 3)在笔记本电脑和台式机(64核)上,我是CPU和GPU并行化的沉重用户.不幸的是,这是我第一次遇到这种类型的错误.

1) the error is really intermittent, sometimes I re-run the same code and get no errors. 2) I run everything on the same multi-core machine (Intel/8 cores). So it is not a communation (network) problem among the clusters. 3) I am a heavy user of CPU and GPU parallelization, on my laptop and desktop (64 cores) Unfortunately, it is the first time that I am getting this type of error.

有人有相同类型的错误吗?

Is anybody having the same type of error?

根据要求,我提供了sessionInfo():

As requested I am providing my sessionInfo():

> sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TTR_0.22-0       xts_0.9-3        doParallel_1.0.1 iterators_1.0.6  foreach_1.4.0    zoo_1.7-9        Revobase_6.2.0   RevoMods_6.2.0  

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.3 grid_2.15.3     lattice_0.20-13 tools_2.15.3   

@SeteveWeston,在其中一个调用的错误下方(同样是间歇性的):

@SeteveWeston, below the error in one of the calls (again it is intermittent):

starting worker pid=8808 on localhost:10187 at 15:21:52.232
starting worker pid=5492 on localhost:10187 at 15:21:53.624
starting worker pid=8804 on localhost:10187 at 15:21:54.997
starting worker pid=8540 on localhost:10187 at 15:21:56.360
starting worker pid=6308 on localhost:10187 at 15:21:57.721
starting worker pid=8164 on localhost:10187 at 15:21:59.137
starting worker pid=8064 on localhost:10187 at 15:22:00.491
starting worker pid=8528 on localhost:10187 at 15:22:01.855
Error in unserialize(node$con) : 
  ReadItem: unknown type 0, perhaps written by later version of R
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted

添加更多信息.我设置了选项(错误=恢复),它提供了以下信息:

Adding a bit more information. I set options(error=recover) and it provided the following information:

Error in serialize(data, node$con) : error writing to connection

Enter a frame number, or 0 to exit   

1: #51: parallelize(FUN = "ensemble.prism", arg = list(prism = iis.long, instances = oos.instances), vectorize.arg = c("prism", "instances"), cores = cores, .export 
2: parallelize.R#58: foreach.bind(idx = i) %dopar% pFUN(idx)
3: e$fun(obj, substitute(ex), parent.frame(), e$data)
4: clusterCall(cl, workerInit, c.expr, exportenv, obj$packages)
5: sendCall(cl[[i]], fun, list(...))
6: postNode(con, "EXEC", list(fun = fun, args = args, return = return, tag = tag))
7: sendData(con, list(type = type, data = value, tag = tag))
8: sendData.SOCKnode(con, list(type = type, data = value, tag = tag))
9: serialize(data, node$con)

Selection: 9

我试图检查连接是否仍然可用,并且有:

I tried to check if the connections were still available, and there are:

Browse[1]> showConnections()
   description                class      mode  text     isopen   can read can write
3  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
4  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
5  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
6  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
7  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
8  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
9  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
10 "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
Browse[1]> 

由于连接是打开的,错误0表示R版本(如@SteveWeston指出的那样),所以我真的无法弄清楚这里发生了什么.

Since the connections are open and error 0 means R version (as pointed out by @SteveWeston), I really can;t figure out what is happening here.

我所遇到的问题

就传递给函数的参数而言,该代码很好.因此,@ MichaelFilosi提供的答案并没有带来太多好处.无论如何,非常感谢您的回答!

The code is fine in terms of arguments passed to the function. Thus, the answer provided by @MichaelFilosi haven't brought much to the table. In any manner, many thanks for your answer!

我找不到确切的电话问题,但至少可以解决该问题.

I couldn't find exactly what was wrong with the call, but, at least, I could workaround the problem.

技巧是将每个并行线程的函数调用参数分解为较小的块.

The trick was to break the arguments of function call, for each parallel thread, into smaller blocks.

神奇地,错误消失了.

让我知道是否同样适用于您!

Let me know if the same worked for you!

推荐答案

这很可能是由于内存不足(请参阅我的

This is most likely due to running out of memory (see my blog post for details). Here's an example how you can cause this error:

> a <- matrix(1, ncol=10^4*2.1, nrow=10^4)
> cl <- makeCluster(8, type = "FORK")
> parSapply(cl, 1:8, function(x) {
+   b <- a + 1
+   mean(b)
+   })
Error in unserialize(node$con) : error reading from connection

这篇关于调用序列化R函数时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆