如何使用 Zeppelin 访问 aws spark-ec2 集群和 s3 存储桶 [英] How to use Zeppelin to access aws spark-ec2 cluster and s3 buckets

查看:39
本文介绍了如何使用 Zeppelin 访问 aws spark-ec2 集群和 s3 存储桶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通过 spark-ec2 脚本设置了 aws ec2 集群.

I have an aws ec2 cluster setup by the spark-ec2 script.

我想配置 Zeppelin,以便我可以在 Zeppelin 本地编写 Scala 代码并在集群上运行它(通过 master).此外,我希望能够访问我的 s3 存储桶.

I would like to configure Zeppelin so that I can write scala code locally on Zeppelin and run it on the cluster (via master). Furthermore I would like to be able to access my s3 buckets.

我关注了 本指南另一个 但是我似乎无法从 zeppelin 到我的集群运行 Scala 代码.

I followed this guide and this other one however I can not seem to run scala code from zeppelin to my cluster.

我在本地安装了 Zeppelin

I installed Zeppelin locally with

mvn install -DskipTests -Dspark.version=1.4.1 -Dhadoop.version=2.7.1

我的安全组设置为 AmazonEC2FullAccess 和 AmazonS3FullAccess.

My security groups were set to both AmazonEC2FullAccess and AmazonS3FullAccess.

我将 Zeppelin Webapp 上的 spark 解释器属性编辑为 spark://.us-west-2.compute.amazonaws.com:7077来自本地[*]

I edited the spark interpreter properties on the Zeppelin Webapp to spark://.us-west-2.compute.amazonaws.com:7077 from local[*]

  1. 当我测试时

  1. When I test out

sc

在解释器中,我收到此错误

in the interpreter, I recieve this error

java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:182) at 

  • 当我尝试编辑conf/zeppelin-site.xml"以将端口更改为 8082 时,没有区别.

  • When I try to edit "conf/zeppelin-site.xml" to change my port to 8082, no difference.

    注意:我最终还想通过以下方式访问我的 s3 存储桶:

    NOTE: I eventually would also want to access my s3 buckets with something like:

    sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx")
    sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","xxx")
    val file = "s3n://<<bucket>>/<<file>>"
    val data = sc.textFile(file)
    data.first
    

    如果有任何仁慈的用户有任何建议(尚未发布在 StackOverflow 上),请告诉我!

    if any benevolent users have any advice (that wasn't already posted on StackOverflow) please let me know!

    推荐答案

    很可能您的 IP 地址被阻止连接到您的 Spark 集群.您可以尝试启动指向该端点的 spark-shell(甚至只是 telnet).要修复它,您可以登录您的 AWS 账户并更改防火墙设置.它也可能没有指向正确的主机(我假设您从 spark://.us-west-2.compute.amazonaws.com:7077 中删除了特定框,但是如果没有,.us-west-2 应该有一点).您可以尝试通过 ssh 连接到该机器并运行 netstat --tcp -l -n 以查看它是否正在侦听(或者甚至只是 ps aux |grep java 以查看 Spark 是否正在运行).

    Most likely your IP address is blocked from connecting to your spark cluster. You can try by launching the spark-shell pointing at that end point (or even just telnetting). To fix it you can log into your AWS account and change the firewall settings. Its also possible that it isn't pointed at the correct host (I'm assuming you removed the specific box from spark://.us-west-2.compute.amazonaws.com:7077 but if not there should be a bit for the .us-west-2). You can try ssh'ing to that machine and running netstat --tcp -l -n to see if its listening (or even just ps aux |grep java to see if Spark is running).

    这篇关于如何使用 Zeppelin 访问 aws spark-ec2 集群和 s3 存储桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆