如何将公共IP绑定到Amazon EC2中的Spark节点? [英] How to bind Public IP to spark nodes in Amazon EC2?

查看:78
本文介绍了如何将公共IP绑定到Amazon EC2中的Spark节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在两个不同区域中的两个实例之间创建一个Spark集群.由于它们不在同一VPC/安全组中,因此我很难将一个区域的主服务器连接到另一区域的从设备(反之亦然).到目前为止,我已经执行了以下操作:

I am trying to create a Spark cluster between two instances in two different regions. As they are not in same VPC/security group, I am having trouble to connect Master from one region to Slave from another region (and vice versa). So far I have done the following:

  1. 编辑的/etc/hosts文件以添加主服务器和从属服务器的公共IP

  1. Edited /etc/hosts file to add public IP of both Master and Slaves

54.208.204.190 master 13.113.105.113 slave01

54.208.204.190 master 13.113.105.113 slave01

将slave01添加到$ SPARK_HOME/conf/slaves文件中

Added slave01 to $SPARK_HOME/conf/slaves file

在$ SPARK_HOME/conf/spark-env.sh中添加了以下内容:

In $SPARK_HOME/conf/spark-env.sh added the following:

export JAVA_HOME=/home/ubuntu/jdk1.8.0_151 export SPARK_WORKER_CORES=8 export SPARK_MASTER_HOST=ec2-54-208-204-190.compute-1.amazonaws.com

export JAVA_HOME=/home/ubuntu/jdk1.8.0_151 export SPARK_WORKER_CORES=8 export SPARK_MASTER_HOST=ec2-54-208-204-190.compute-1.amazonaws.com

我已在SPARK_MASTER_HOST中分配了主服务器的公共DNS,因为分配主服务器的公共IP无法正常工作.它向我显示了以下错误:

I have assigned Public DNS of master in SPARK_MASTER_HOST because assigning public IP of master was not working. It was showing me the following error:

MasterUI' could not bind on port 8080.

因此,以上配置对我而言有效,并且我可以看到slave01已成功向master注册,并且在Spark WebUI中一名工人正按预期显示.但是,当我尝试运行SparkPi示例时,它无法添加执行程序.在slave01的日志中,我发现了以下内容:

So, the above configuration worked for me and I can see slave01 successfully registered with master, and in Spark WebUI one worker was showing as intended. But when I tried to run SparkPi example, it could not add an executor. In logs from slave01 I have found the following:

`Caused by: java.io.IOException: Failed to connect to /172-31-23-69:48441`

172-31-23-69是主机的专用IP.以我的理解,slave01希望通过该master的私有IP连接到master,但是由于它们不在同一个vpc中,slave01无法连接到master.我不确定为什么slave01首先要使用master的私有IP,因为我在spark-env.sh和hosts文件中同时给出了Public DNS和master的IP.另外,slave01如何知道主机的私有IP也是另一个有趣的问题.

172-31-23-69 is the private IP of the master. In my understanding, the slave01 wanted to connect to master by this private IP of master, but as they are not in the same vpc slave01 is failing to connect to master. I am not sure why slave01 will want to use private IP of master in the first place because I have given both Public DNS and IP of the master in spark-env.sh and hosts file. Also, how slave01 came to know the private IP of master is another interesting question.

我尝试在两个实例中分别将SPARK_LOCAL_IP变量设置为公共IP,但这也不起作用.因此,如果有人可以在这里给我任何指示,我将不胜感激.预先感谢.

I have tried to set SPARK_LOCAL_IP variable to public IP in both instances respectively, but that does not work either. So if anyone can show me any kind of direction here I will be very grateful. Thanks in advance.

推荐答案

当EC2实例具有关联的公共IPv4地址时,由于公共IP地址的方式,您无法将套接字绑定到公共IP地址在EC2中处理.

When an EC2 instance has a public IPv4 address associated with it, you can't bind a socket to the public IP address, because of the way public IP addresses are handled in EC2.

公用IP由Internet网关静态转换为专用IP,实例本身不知道公用IP地址.

The public IP is statically NAT-ed to the private IP by the Internet Gateway -- the instance itself is not aware of the public IP address.

(请参阅ifconfig的输出-公共IP不存在,并且不应存在-仅私有IP).

(See the output from ifconfig -- the public IP is not there, and isn't supposed to be there -- only the private IP).

VPC对等关系允许您将多个VPC的网络互连在一起,从而使实例可以跨账户边界甚至AWS区域边界相互访问.

VPC peering allows you to interconnect the networks of multiple VPCs together, giving instances access to each other across account boundaries and even AWS region boundaries.

可能有一种针对您正在执行的操作的替代解决方案,但是将流量全部限制在私有IP空间的范围内似乎是一个不错的解决方法和最佳实践.

There may be an alternate solution specific to what you're doing, but keeping the traffic all within the bounds of private IP space seems like a good workaround and best practice.

请注意,互连的VPC必须具有唯一的,不重叠的CIDR块.对等不是可传递的,因此将VPC A关联到B,然后将VPC B关联到C不允许VPC A和C进行通信.具有实例需要通信的任何两个VPC必须直接对等.

Note that interconnected VPCs must have unique, non-overlapping CIDR blocks. Peering Isn't transitive, so peering VPC A to B and then peering VPC B to C does not allow VPCs A and C to communicate. Any two VPCs that have instances needing to communicate must be directly peered.

https://docs.aws.amazon.com/AmazonVPC/latest/PeeringGuide/Welcome.html

这篇关于如何将公共IP绑定到Amazon EC2中的Spark节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆