R 雪中的 makeCluster 函数无限期挂起 [英] makeCluster function in R snow hangs indefinitely

查看:26
本文介绍了R 雪中的 makeCluster 函数无限期挂起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Linux 机器上 R 包 snow 中的 makeCluster 函数在远程 Linux 机器上启动 SOCK 集群.两台机器成功通信的一切似乎都已解决(我能够在两者之间建立 ssh 连接).但是:

I am using makeCluster function from R package snow from Linux machine to start a SOCK cluster on a remote Linux machine. All seems settled for the two machines to communicate succesfully (I am able to estabilish ssh connections between the two). But:

makeCluster("192.168.128.24",type="SOCK")

不抛出任何结果,只是无限期挂起.

does not throw any result, just hangs indefinitely.

我做错了什么?

非常感谢

推荐答案

很遗憾,在创建雪(或并行)集群对象时可能会出错的事情很多,最常见的失败模式是无限期挂起.问题是 makeSOCKcluster 一个一个地启动集群 worker,每个 worker(如果成功启动)必须在 master 继续启动下一个 worker 之前建立一个与 master 的套接字连接.如果任何 worker 无法连接回 master,makeSOCKcluster 将挂起而没有任何错误消息.工作人员可能会发出错误消息,但默认情况下,任何错误消息都会重定向到 /dev/null.

Unfortunately, there are a lot of things that can go wrong when creating a snow (or parallel) cluster object, and the most common failure mode is to hang indefinitely. The problem is that makeSOCKcluster launches the cluster workers one by one, and each worker (if successfully started) must make a socket connection back to the master before the master proceeds to launch the next worker. If any of the workers fail to connect back to the master, makeSOCKcluster will hang without any error message. The worker may issue an error message, but by default any error message is redirected to /dev/null.

除了 ssh 问题之外,makeSOCKcluster 可能会挂起,因为:

In addition to ssh problems, makeSOCKcluster could hang because:

  • R 未安装在工作机器上
  • snow 没有安装在工作机器上
  • R 或 snow 未安装在与本地机器相同的位置
  • 当前用户在工作机器上不存在
  • 网络问题
  • 防火墙问题

还有更多的可能性.

换句话说,没有更多信息,没有人能够诊断出这个问题,因此您必须进行一些故障排除才能获得这些信息.

In other words, no one can diagnose this problem without further information, so you have to do some troubleshooting in order to get that information.

根据我的经验,最有用的故障排除技术是手动模式,您可以通过在创建集群对象时指定 manual=TRUE 来启用该模式.设置 outfile="" 也是一个好主意,这样工作人员的错误消息就不会被重定向到 /dev/null:

In my experience, the single most useful troubleshooting technique is manual mode which you enable by specifying manual=TRUE when creating the cluster object. It's also a good idea to set outfile="" so that error messages from the workers aren't redirected to /dev/null:

cl <- makeSOCKcluster("192.168.128.24", manual=TRUE, outfile="")

makeSOCKcluster 会在指定机器上的终端中显示一个 Rscript 命令来执行,然后它会等待你执行该命令.换句话说,在您的情况下,ma​​keSOCKcluster 将挂起,直到您在主机 192.168.128.24 上手动启动工作程序.请记住,这是一种故障排除技术,而不是问题的解决方案,希望通过尝试手动启动来获取有关工作人员不启动的更多信息.

makeSOCKcluster will display an Rscript command to execute in a terminal on the specified machine, and then it will wait for you to execute that command. In other words, makeSOCKcluster will hang until you manually start the worker on host 192.168.128.24, in your case. Remember that this is a troubleshooting technique, not a solution to the problem, and the hope is to get more information about why the workers aren't starting by trying to start them manually.

显然,使用手动模式可以绕过任何 ssh 问题(因为您没有使用 ssh),所以如果您可以在手动模式下成功创建 SOCK 集群,那么 ssh 可能是您的问题.如果未找到 Rscript 命令,则要么未安装 R,要么已安装在其他位置.但希望您会收到一些错误消息,引导您找到解决方案.

Obviously, the use of manual mode bypasses any ssh issues (since you're not using ssh), so if you can create a SOCK cluster successfully in manual mode, then probably ssh is your problem. If the Rscript command isn't found, then either R isn't installed, or it's installed in a different location. But hopefully you'll get some error message that will lead you to the solution.

如果在指定机器上执行指定的 Rscript 命令后 makeSOCKcluster 仍然挂起,那么您可能存在网络或防火墙问题.

If makeSOCKcluster still just hangs after you've executed the specified Rscript command on the specified machine, then you probably have a networking or firewall issue.

有关更多故障排除建议,请参阅我对使集群在 doParallel/降雪挂起中的回答.

For more troubleshooting advice, see my answer for making cluster in doParallel / snowfall hangs.

这篇关于R 雪中的 makeCluster 函数无限期挂起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆