Spark EC2 SSH连接错误SSH返回码255 [英] Spark EC2 SSH connection error SSH return code 255

查看:188
本文介绍了Spark EC2 SSH连接错误SSH返回码255的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每次我尝试通过Spark ec2 / spark_ec2.py文件在AWS上启动Spark集群时,都会收到SSH连接错误,最终 会得到解决,但会浪费很多时间。

Every time I try to start a Spark cluster on AWS via the Spark ec2/spark_ec2.py file I get an SSH connection error that eventually gets resolved but wastes a lot of time.

在将其标记为重复项之前,我知道有很多类似的问题要问,但有两个主要区别:a)我的连接总是完成(最终),最后我得到一个健康的Spark集群,并且b)其他问题的答案通常围绕先前的Spark版本(例如1.2、1.3等)。我一直在12个月以前(1.3)到今天(1.6.1)经历过这个问题。

Before you mark this as a duplicate I'm aware there quite a few similar questions asked but there are two key distinctions: a) my connection always completes (eventually) and I end up with a healthy Spark cluster and b) the "answers" for the other questions are generally centered around previous Spark versions (e.g., 1.2, 1.3, etc.). I have always experienced this issue going back 12 months ago w/1.3 through today with 1.6.1.

谢谢!

终端输出:

Launched master in us-east-1e, regid = r-a1b2c3d4
Waiting for AWS to propagate instance metadata...
Waiting for cluster to enter 'ssh-ready' state...........

Warning: SSH connection error. (This could be temporary.)
Host: ec2-xx-xx-xx-xxx.compute-1.amazonaws.com
SSH return code: 255
SSH output: ssh: connect to host ec2-xx-xx-xx-xxx.compute-1.amazonaws.com port 22: Connection refused

.

Warning: SSH connection error. (This could be temporary.)
Host: ec2-xx-xx-xx-xxx.compute-1.amazonaws.com
SSH return code: 255
SSH output: ssh: connect to host ec2-xx-xx-xx-xxx.compute-1.amazonaws.com port 22: Connection refused

.

Warning: SSH connection error. (This could be temporary.)
Host: ec2-xx-xx-xx-xxx.compute-1.amazonaws.com
SSH return code: 255
SSH output: ssh: connect to host ec2-xx-xx-xx-xxx.compute-1.amazonaws.com port 22: Connection refused

.

Warning: SSH connection error. (This could be temporary.)
Host: ec2-xx-xx-xx-xxx.compute-1.amazonaws.com
SSH return code: 255
SSH output: ssh: connect to host ec2-xx-xx-xx-xxx.compute-1.amazonaws.com port 22: Connection refused

.

Warning: SSH connection error. (This could be temporary.)
Host: ec2-xx-xx-xx-xxx.compute-1.amazonaws.com
SSH return code: 255
SSH output: ssh: connect to host ec2-xx-xx-xx-xxx.compute-1.amazonaws.com port 22: Connection refused

.

Warning: SSH connection error. (This could be temporary.)
Host: ec2-xx-xx-xx-xxx.compute-1.amazonaws.com
SSH return code: 255
SSH output: ssh: connect to host ec2-xx-xx-xx-xxx.compute-1.amazonaws.com port 22: Connection refused

.

Warning: SSH connection error. (This could be temporary.)
Host: ec2-xx-xx-xx-xxx.compute-1.amazonaws.com
SSH return code: 255
SSH output: ssh: connect to host ec2-xx-xx-xx-xxx.compute-1.amazonaws.com port 22: Connection refused

.
Cluster is now in 'ssh-ready' state. Waited 833 seconds.
Generating cluster's SSH key on master...


推荐答案

spark-ec2脚本基于Amazon Linux构建AMI基本AMI

The spark-ec2 scripts build AMIs based on the Amazon Linux base AMI:

# Creates an AMI for the Spark EC2 scripts starting with a stock Amazon 
# Linux AMI.
# This has only been tested with Amazon Linux AMI 2014.03.2 

因此,我认为SSH连接延迟/启动缓慢是由于EC2实例在创建时应用了(或尝试并超时,具体取决于VPC配置)关键补丁/安全更新,如 Amazon Linux AMI常见问题解答

I therefore believe that the delay in SSH connectivity / slow start up is due to the EC2 instance applying (or attempting to and timing out, depending on VPC configuration) critical patches / security updates on creation, as detailed in the Amazon Linux AMI FAQ:


在首次启动时,Amazon Linux AMI从软件包
信息库中安装被评级为关键
或重要的用户空间安全更新,并且在启动服务(例如SSH)之前这样做

On first boot, the Amazon Linux AMI installs from the package repositories any user space security updates that are rated critical or important, and it does so before services, such as SSH, start.

如果AMI无法访问yum存储库,它将超时并且
重试多次,然后再完成启动过程。
的可能原因是限制性防火墙设置或VPC设置,
阻止访问Amazon Linux AMI软件包存储库。

If the AMI cannot access the yum repositories, it will timeout and retry multiple times before completing the boot procedure. Possible reasons for this are restrictive firewall settings or VPC settings, which prevent access to the Amazon Linux AMI package repositories.

如果确实如此,那么从应用了所有相关更新的VM创建您自己的AMI,并使用--ami选项调用脚本应该可以解决问题(可以自动进行此操作以保持最重要的状态。

If this is indeed the case, then creating your own AMI from a VM that has all of the relevant updates applied and calling the script with the --ami option should resolve the problem (this can be automated to keep on top of everything).

根据常见问题解答


禁用安全性从AWS EC2控制台启动时进行更新:

To disable the security update on boot from the AWS EC2 Console:

在请求实例
向导的高级实例选项页面上,有一个文本字段用于发送Amazon Linux AMI
用户数据。该数据可以作为文本输入,也可以作为文件上传。在
情况下,数据应为:

On the "Advanced Instance Options" page in the Request Instances Wizard, there is a text field for sending the Amazon Linux AMI user-data. This data can be entered as text, or uploaded as a file. In either case, the data should be:

#cloud-config
repo_upgrade: none

要从命令行禁用启动时的安全更新:

To disable the security update on boot from the command line:

使用前面的用户数据创建一个文本文件,并将其通过-user-data file://< filename> <传递给aws
ec2运行实例/ code>标志(也可以使用 ec2-run-instances -f 完成此
)。

Create a text file with the preceding user-data, and pass it to aws ec2 run-instances with the --user-data file://<filename> flag (this can also be done with ec2-run-instances -f).

要在重新捆绑Amazon
Linux AMI时在引导时禁用安全更新:

To disable the security update on boot when rebundling the Amazon Linux AMI:

修改 / etc / cloud / cloud.cfg 并将 repo_upgrade:security 更改为
repo_upgrade:none

这篇关于Spark EC2 SSH连接错误SSH返回码255的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆