R snow中的makeCluster函数无限期挂起 [英] makeCluster function in R snow hangs indefinitely

查看:249
本文介绍了R snow中的makeCluster函数无限期挂起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Linux机器上R包snow中的makeCluster函数在远程Linux机器上启动SOCK群集.似乎所有人都为两台机器成功通信做好了准备(我能够建立两者之间的ssh连接).但是:

I am using makeCluster function from R package snow from Linux machine to start a SOCK cluster on a remote Linux machine. All seems settled for the two machines to communicate succesfully (I am able to estabilish ssh connections between the two). But:

makeCluster("192.168.128.24",type="SOCK")

不抛出任何结果,只是无限期地挂起.

does not throw any result, just hangs indefinitely.

我做错了什么?

非常感谢

推荐答案

不幸的是,创建雪(或并行)群集对象时,很多事情都会出错,最常见的故障模式是无限期地挂起.问题是makeSOCKcluster会逐个启动群集工作程序,并且每个工作程序(如果成功启动)必须在主程序继续启动下一个工作程序之前建立与主程序的套接字连接.如果任何一个工作程序都无法连接回主服务器,则makeSOCKcluster将挂起,而不会出现任何错误消息.工作程序可能会发出错误消息,但是默认情况下,任何错误消息都会重定向到/dev/null.

Unfortunately, there are a lot of things that can go wrong when creating a snow (or parallel) cluster object, and the most common failure mode is to hang indefinitely. The problem is that makeSOCKcluster launches the cluster workers one by one, and each worker (if successfully started) must make a socket connection back to the master before the master proceeds to launch the next worker. If any of the workers fail to connect back to the master, makeSOCKcluster will hang without any error message. The worker may issue an error message, but by default any error message is redirected to /dev/null.

除了ssh问题外,makeSOCKcluster可能会挂起,原因是:

In addition to ssh problems, makeSOCKcluster could hang because:

  • R未安装在工作计算机上
  • 未在工作计算机上安装雪地
  • R或雪未与本地计算机安装在同一位置
  • 工作计算机上不存在当前用户
  • 网络问题
  • 防火墙问题

还有更多的可能性.

换句话说,没有更多的信息,任何人都无法诊断出此问题,因此您必须进行一些故障排除才能获得该信息.

In other words, no one can diagnose this problem without further information, so you have to do some troubleshooting in order to get that information.

根据我的经验,最有用的故障排除方法是手动模式,您可以在创建群集对象时通过指定manual=TRUE来启用它.设置outfile=""也是一个好主意,这样就不会将来自工作程序的错误消息重定向到/dev/null:

In my experience, the single most useful troubleshooting technique is manual mode which you enable by specifying manual=TRUE when creating the cluster object. It's also a good idea to set outfile="" so that error messages from the workers aren't redirected to /dev/null:

cl <- makeSOCKcluster("192.168.128.24", manual=TRUE, outfile="")

makeSOCKcluster将显示Rscript命令以在指定计算机上的终端中执行,然后它将等待您执行该命令.换句话说, makeSOCKcluster将挂起,直到您在主机192.168.128.24上手动启动工作服务器为止.请记住,这是一种故障排除技术,不是解决问题的方法,希望是通过尝试手动启动工人来获得更多有关为何工人不启动的信息.

makeSOCKcluster will display an Rscript command to execute in a terminal on the specified machine, and then it will wait for you to execute that command. In other words, makeSOCKcluster will hang until you manually start the worker on host 192.168.128.24, in your case. Remember that this is a troubleshooting technique, not a solution to the problem, and the hope is to get more information about why the workers aren't starting by trying to start them manually.

很显然,使用手动模式会绕过任何ssh问题(因为您没有使用ssh),因此,如果您可以在手动模式下成功创建SOCK集群,那么ssh可能就是您的问题.如果找不到Rscript命令,则说明未安装R或将其安装在其他位置.但是希望您会收到一些错误消息,将您引向解决方案.

Obviously, the use of manual mode bypasses any ssh issues (since you're not using ssh), so if you can create a SOCK cluster successfully in manual mode, then probably ssh is your problem. If the Rscript command isn't found, then either R isn't installed, or it's installed in a different location. But hopefully you'll get some error message that will lead you to the solution.

如果在指定的计算机上执行了指定的Rscript命令后,makeSOCKcluster仍然只是挂起,则可能是网络或防火墙问题.

If makeSOCKcluster still just hangs after you've executed the specified Rscript command on the specified machine, then you probably have a networking or firewall issue.

有关更多疑难解答的建议,请参阅我的答案,以在doParallel中使群集/降雪挂起.

For more troubleshooting advice, see my answer for making cluster in doParallel / snowfall hangs.

这篇关于R snow中的makeCluster函数无限期挂起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆