调用序列化 R 函数时出错 [英] Error calling serialize R function

查看:28
本文介绍了调用序列化 R 函数时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将以下包加载到 R 中:

I am loading the following packages into R:

library(foreach)
library(doParallel)
library(iterators)

我长时间并行化"代码,但最近我在代码运行时出现间歇性停止.错误是:

I "parallelize" code for a long time, but lately I am getting INTERMITTENT stops while code is running. The error is:

Error in serialize(data, node$con) : error writing to connection

我的猜测是,我使用以下命令打开的连接可能已过期:

My educated guess is that maybe the connection that I open using the commands below, has expired:

## Register Cluster
##
cores<-8
cl <- makeCluster(cores)
registerDoParallel(cl)

查看 makeCluster 手册页,我看到默认情况下连接仅在 30 天后过期!我可以设置 options(error=recover) 以便在代码停止时即时检查连接是否打开,但我决定在此之前发布这个一般性问题.

Looking at makeCluster man page I see that by default the connections expires only after 30 days! I could set options(error=recover) in order to check, on the fly, if the connection is opened or not when the code halts, but I decided to post this general question before.

重要提示:

1) 错误确实是间歇性的,有时我重新运行相同的代码并没有出现错误.2)我在同一台多核机器(英特尔/8核)上运行所有东西.所以这不是集群之间的通信(网络)问题.3) 我是笔记本电脑和台式机(64 核)上 CPU 和 GPU 并行化的重度用户,不幸的是,这是我第一次遇到此类错误.

1) the error is really intermittent, sometimes I re-run the same code and get no errors. 2) I run everything on the same multi-core machine (Intel/8 cores). So it is not a communation (network) problem among the clusters. 3) I am a heavy user of CPU and GPU parallelization, on my laptop and desktop (64 cores) Unfortunately, it is the first time that I am getting this type of error.

有人有同样类型的错误吗?

Is anybody having the same type of error?

根据要求,我提供了我的 sessionInfo():

As requested I am providing my sessionInfo():

> sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TTR_0.22-0       xts_0.9-3        doParallel_1.0.1 iterators_1.0.6  foreach_1.4.0    zoo_1.7-9        Revobase_6.2.0   RevoMods_6.2.0  

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.3 grid_2.15.3     lattice_0.20-13 tools_2.15.3   

@SeteveWeston,在其中一个调用中的错误下方(同样是间歇性的):

@SeteveWeston, below the error in one of the calls (again it is intermittent):

starting worker pid=8808 on localhost:10187 at 15:21:52.232
starting worker pid=5492 on localhost:10187 at 15:21:53.624
starting worker pid=8804 on localhost:10187 at 15:21:54.997
starting worker pid=8540 on localhost:10187 at 15:21:56.360
starting worker pid=6308 on localhost:10187 at 15:21:57.721
starting worker pid=8164 on localhost:10187 at 15:21:59.137
starting worker pid=8064 on localhost:10187 at 15:22:00.491
starting worker pid=8528 on localhost:10187 at 15:22:01.855
Error in unserialize(node$con) : 
  ReadItem: unknown type 0, perhaps written by later version of R
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted

添加更多信息.我设置了 options(error=recover) 并提供了以下信息:

Adding a bit more information. I set options(error=recover) and it provided the following information:

Error in serialize(data, node$con) : error writing to connection

Enter a frame number, or 0 to exit   

1: #51: parallelize(FUN = "ensemble.prism", arg = list(prism = iis.long, instances = oos.instances), vectorize.arg = c("prism", "instances"), cores = cores, .export 
2: parallelize.R#58: foreach.bind(idx = i) %dopar% pFUN(idx)
3: e$fun(obj, substitute(ex), parent.frame(), e$data)
4: clusterCall(cl, workerInit, c.expr, exportenv, obj$packages)
5: sendCall(cl[[i]], fun, list(...))
6: postNode(con, "EXEC", list(fun = fun, args = args, return = return, tag = tag))
7: sendData(con, list(type = type, data = value, tag = tag))
8: sendData.SOCKnode(con, list(type = type, data = value, tag = tag))
9: serialize(data, node$con)

Selection: 9

我尝试检查连接是否仍然可用,并且有:

I tried to check if the connections were still available, and there are:

Browse[1]> showConnections()
   description                class      mode  text     isopen   can read can write
3  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
4  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
5  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
6  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
7  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
8  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
9  "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
10 "<-www.007guard.com:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
Browse[1]> 

由于连接是打开的并且错误 0 表示 R 版本(正如@SteveWeston 指出的那样),我真的无法弄清楚这里发生了什么.

Since the connections are open and error 0 means R version (as pointed out by @SteveWeston), I really can;t figure out what is happening here.

编辑 1:

我解决问题的方法

就传递给函数的参数而言,代码很好.因此,@MichaelFilosi 提供的答案并没有带来太多好处.无论如何,非常感谢您的回答!

The code is fine in terms of arguments passed to the function. Thus, the answer provided by @MichaelFilosi haven't brought much to the table. In any manner, many thanks for your answer!

我无法确切地找到通话的问题,但至少,我可以解决这个问题.

I couldn't find exactly what was wrong with the call, but, at least, I could workaround the problem.

诀窍是将每个并行线程的函数调用参数分解成更小的块.

The trick was to break the arguments of function call, for each parallel thread, into smaller blocks.

神奇地错误消失了.

如果同样的方法对您有用,请告诉我!

Let me know if the same worked for you!

推荐答案

这很可能是由于内存不足(请参阅我的 博文 了解详情).以下是导致此错误的示例:

This is most likely due to running out of memory (see my blog post for details). Here's an example how you can cause this error:

> a <- matrix(1, ncol=10^4*2.1, nrow=10^4)
> cl <- makeCluster(8, type = "FORK")
> parSapply(cl, 1:8, function(x) {
+   b <- a + 1
+   mean(b)
+   })
Error in unserialize(node$con) : error reading from connection

这篇关于调用序列化 R 函数时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆