如何使用飞艇来访问AWS火花EC2集群和S3桶 [英] How to use Zeppelin to access aws spark-ec2 cluster and s3 buckets

查看:267
本文介绍了如何使用飞艇来访问AWS火花EC2集群和S3桶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个AWS EC2集群设置由火花EC2脚本。

I have an aws ec2 cluster setup by the spark-ec2 script.

我想配置飞艇,这样我可以在本地写斯卡拉code对飞艇和集群(通过主站)上运行。此外,我想能够访问我的S3桶。

I would like to configure Zeppelin so that I can write scala code locally on Zeppelin and run it on the cluster (via master). Furthermore I would like to be able to access my s3 buckets.

我跟着<一个href="http://christopher5106.github.io/big/data/2015/07/03/iPython-Jupyter-Spark-Notebook-and-Zeppelin-comparison-for-big-data-in-scala-and-python-for-spark-clusters.html"相对=nofollow>这个指南和的这另外一个但是我似乎无法运行飞艇斯卡拉code到我的集群。

I followed this guide and this other one however I can not seem to run scala code from zeppelin to my cluster.

我安装了本地齐柏林

mvn install -DskipTests -Dspark.version=1.4.1 -Dhadoop.version=2.7.1

我的安全组分别设置为两个AmazonEC2FullAccess和AmazonS3FullAccess。

My security groups were set to both AmazonEC2FullAccess and AmazonS3FullAccess.

我编辑的火花间preTER性能上的飞艇的webapp到的火花://.us-west-2.compute.amazonaws.com:7077 的 从地方[*]

I edited the spark interpreter properties on the Zeppelin Webapp to spark://.us-west-2.compute.amazonaws.com:7077 from local[*]

  1. 当我考出

  1. When I test out

sc

在跨preTER,我收到此错误

in the interpreter, I recieve this error

java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:182) at 

  • 当我尝试编辑的conf /齐柏林-site.xml中改变我的端口8082,没有什么区别。

  • When I try to edit "conf/zeppelin-site.xml" to change my port to 8082, no difference.

    注意:我最终也将要访问我的S3存储桶的东西,如:

    NOTE: I eventually would also want to access my s3 buckets with something like:

    sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx")
    sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","xxx")
    val file = "s3n://<<bucket>>/<<file>>"
    val data = sc.textFile(file)
    data.first
    

    如果任何善意的用户有任何意见(这是不是已经张贴在StackOverflow的),请让我知道!

    if any benevolent users have any advice (that wasn't already posted on StackOverflow) please let me know!

    推荐答案

    最有可能您的IP地址连接到您的火花集群受阻。您可以通过启动火花壳指着那个终点(甚至只是远程登录)试试。要解决它,你可以登录到您的AWS帐户并更改防火墙设置。它也有可能是它没有指向正确的主机(我假设您删除火花具体框://.us-west-2.compute.amazonaws.com:7077 ,但如果不应该有一点对于.US - 西-2)。您可以尝试ssh'ing到该计算机并运行netstat的--tcp -l -n,看看它的收听。(甚至只是的ps aux | grep的Java来看看星火运行)

    Most likely your IP address is blocked from connecting to your spark cluster. You can try by launching the spark-shell pointing at that end point (or even just telnetting). To fix it you can log into your AWS account and change the firewall settings. Its also possible that it isn't pointed at the correct host (I'm assuming you removed the specific box from spark://.us-west-2.compute.amazonaws.com:7077 but if not there should be a bit for the .us-west-2). You can try ssh'ing to that machine and running netstat --tcp -l -n to see if its listening (or even just ps aux |grep java to see if Spark is running).

    这篇关于如何使用飞艇来访问AWS火花EC2集群和S3桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆