Spark独立群集-从站未连接到主站 [英] Spark Standalone Cluster - Slave not connecting to Master

查看:90
本文介绍了Spark独立群集-从站未连接到主站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据官方文档设置Spark独立集群.

I am trying to setup a Spark standalone cluster following the official documentation.

我的主服务器在运行ubuntu的本地vm上,我也有一名工人在同一台计算机上运行.它正在连接,我可以在主服务器的WebUI中查看其状态.

My master is on a local vm running ubuntu and I also have one worker running in the same machine. It is connecting and I am able to see its status in the WebUI of the master.

这是WebUi图像-

但是当我尝试从另一台机器连接一个从站时,我无法做到这一点.

But when I try to connect a slave from another machine, I am not able to do it.

这是我从另一台计算机启动时进入工作进程的日志消息. 在更新conf \ slaves和从属服务器的start-slave.sh spark://spark:7077之后,我尝试从主服务器使用start-slaves.sh.

This is the log message I get in the worker when I start from another machine. I have tried using start-slaves.sh from the master after updating conf\slaves and also start-slave.sh spark://spark:7077 from the slave.

[主主机名-spark; Worker hostanme-worker]

[Master hostname - spark; Worker hostanme - worker]

15/07/01 11:54:16 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@spark:7077] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://sparkMaster@spark:7077]].
15/07/01 11:54:59 ERROR Worker: All masters are unresponsive! Giving up.
15/07/01 11:54:59 INFO Utils: Shutdown hook called

当我尝试从从站到主站进行telnet时,这就是我得到的-

When I try to telnet from the slave to the master, this is what I get -

root@worker:~# telnet spark 7077
Trying 10.xx.xx.xx...
Connected to spark.
Escape character is '^]'.
Connection closed by foreign host.

Telnet似乎可以正常工作,但是一旦建立连接就将其关闭.这可能与问题有关吗?

Telnet seems to work but the connection is closed as soon as it is established. Could this have something to do with the problem ?

我已经在两台机器上的/etc/hosts中添加了主IP地址和从IP地址. 我遵循了 SPARK +独立群集提供的所有解决方案:无法启动另一台机器上的工作人员 但是他们没有为我工作.

I have added the master and slave IP addresses in /etc/hosts on both machines. I followed all the solutions given at SPARK + Standalone Cluster: Cannot start worker from another machine but they have not worked for me.

我在两台机器上的spark-env.sh中都设置了以下配置-

I have the following config set in spark-env.sh in both machines -

导出SPARK_MASTER_IP =火花

export SPARK_MASTER_IP=spark

导出SPARK_WORKER_PORT = 44444

export SPARK_WORKER_PORT=44444

非常感谢您的帮助.

推荐答案

我遇到了与您完全相同的问题,只是想出了使它工作的方法.

I encounter the exact same problem as you and just figure out how to get it to work.

问题是您的Spark Master正在侦听主机名,例如您的示例 spark ,这会导致同一主机上的工作程序能够成功注册,但使用命令start-slave.sh spark://spark:7077从另一台计算机失败

The problem is that your spark master is listening on hostname, in your example spark, which causes the worker on the same host being able to register successfully but failed from another machine with command start-slave.sh spark://spark:7077.

解决方案是确保在文件 conf/spark-env.sh

The solution is to make sure the value SPARK_MASTER_IP is specified with ip in file conf/spark-env.sh

    SPARK_MASTER_IP=<your host ip>

在主节点上,然后正常启动Spark Master.您可以打开Web GUI以确保您的spark master在启动后显示为 spark://YOUR_HOST_IP:7077 .然后,在另一台计算机上,使用命令start-slave.sh spark://<your host ip>:7077 应该启动并注册工作人员以成功掌握.

on your master node, and start your spark master as normal. You can open your web GUI to make sure your spark master appears as spark://YOUR_HOST_IP:7077 after the start. Then, on another machine with command start-slave.sh spark://<your host ip>:7077 should start and register worker to master successfully.

希望对您有帮助

这篇关于Spark独立群集-从站未连接到主站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆