在AWS EC2上添加新的Spark Worker-访问错误 [英] Adding new Spark workers on AWS EC2 - access error
问题描述
我有使用 spark-ec2
脚本启动的现有强大的Spark集群。我正在尝试按照说明添加新的从属服务器:
I have the existing oeprating Spark cluster that was launched with spark-ec2
script. I'm trying to add new slave by following the instructions:
- 停止集群
- 打开AWS控制台在其中一个从属服务器上更像这样启动
- 启动集群
尽管新实例已添加到相同的安全组中,并且我可以使用相同的私钥成功地对其进行SSH,但 spark-ec2 ... start
调用无法访问此实例机器出于某种原因:
Although the new instance is added to the same security group and I can successfully SSH to it with the same private key, spark-ec2 ... start
call can't access this machine for some reason:
在所有群集节点上运行setup-slave来挂载文件系统,等等...
Running setup-slave on all cluster nodes to mount filesystems, etc...
[1] 00:59:59 [FAILURE] xxx.compute.amazonaws.com
[1] 00:59:59 [FAILURE] xxx.compute.amazonaws.com
以错误代码255 Stderr退出:权限被拒绝(公钥)。
Exited with error code 255 Stderr: Permission denied (publickey).
显然是,随后在尝试在此实例上部署Spark东西时出现了其他错误。
, obviously, followed by tons of other errors while trying to deploy Spark stuff on this instance.
原因是Spark Master计算机对该新从属服务器没有 rsync
访问权限,但22端口已打开。 ..
The reason is that Spark Master machine doesn't have an rsync
access for this new slave, but the 22 port is open...
推荐答案
问题是在Spark Master上生成的SSH密钥未传输到该新从服务器。使用 start
命令的Spark-ec2脚本省略了此步骤。解决方案是将 launch
命令与-resume
选项一起使用。然后将SSH密钥转移到新的从属服务器,一切顺利。
The issue was that SSH key generated on Spark Master was not transferred to this new slave. Spark-ec2 script with start
command omits this step. The solution is to use launch
command with --resume
options. Then the SSH key is transferred to the new slave and everything goes smooth.
另一种解决方案是添加主服务器的公钥(〜/ .ssh / id_rsa.pub)到新添加的奴隶〜/ .ssh / authorized_keys。 (在Spark邮件列表上获得此建议)
Yet another solution is to add the master's public key (~/.ssh/id_rsa.pub) to the newly added slaves ~/.ssh/authorized_keys. (Got this advice on Spark mailing list)
这篇关于在AWS EC2上添加新的Spark Worker-访问错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!