为什么 Zeppelin notebook 无法连接到 S3 [英] Why Zeppelin notebook is not able to connect to S3
问题描述
我已经在我的 aws EC2 机器上安装了 Zeppelin 以连接到我的 Spark 集群.
I have installed Zeppelin, on my aws EC2 machine to connect to my spark cluster.
Spark 版本:独立:spark-1.2.1-bin-hadoop1.tgz
在我的用例中尝试访问 S3 中的文件时,我能够连接到 Spark 集群,但出现以下错误.
I am able to connect to spark cluster but getting following error, when trying to access the file in S3 in my usecase.
代码:
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "YOUR_KEY_ID")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","YOUR_SEC_KEY")
val file = "s3n://<bucket>/<key>"
val data = sc.textFile(file)
data.count
file: String = s3n://<bucket>/<key>
data: org.apache.spark.rdd.RDD[String] = s3n://<bucket>/<key> MappedRDD[1] at textFile at <console>:21
ava.lang.NoSuchMethodError: org.jets3t.service.impl.rest.httpclient.RestS3Service.<init>(Lorg/jets3t/service/security/AWSCredentials;)V
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
我已经通过以下命令构建了 Zeppelin:
I have build the Zeppelin by following command:
mvn clean package -Pspark-1.2.1 -Dhadoop.version=1.0.4 -DskipTests
当我尝试使用 hadoop 配置文件-Phadoop-1.0.4"进行构建时,它警告说它不存在.
when I trying to build with hadoop profile "-Phadoop-1.0.4", it is giving warning that it doesn't exist.
我也尝试过 this 中提到的 -Phadoop-1网站.但得到了同样的错误.1.x 到 2.1.x hadoop-1
I have also tried -Phadoop-1 mentioned in this spark website. but got the same error. 1.x to 2.1.x hadoop-1
请让我知道我在这里遗漏了什么.
Please let me know what I am missing here.
推荐答案
以下安装对我有用(问题也花了很多天):
The following installation worked for me (spent also many days with the problem):
针对 EC2 集群上的 Hadoop 2.3 设置的 Spark 1.3.1 预构建
Spark 1.3.1 prebuild for Hadoop 2.3 setup on EC2-cluster
git clone https://github.com/apache/incubator-zeppelin.git(日期:25.07.2015)
git clone https://github.com/apache/incubator-zeppelin.git (date: 25.07.2015)
通过以下命令安装了zeppelin(属于https://github.com/上的说明)apache/incubator-zeppelin):
installed zeppelin via the following command (belonging to instructions on https://github.com/apache/incubator-zeppelin):
mvn clean package -Pspark-1.3 -Dhadoop.version=2.3.0 -Phadoop-2.3 -DskipTests
通过conf/zeppelin-site.xml"将端口更改为 8082(Spark 使用端口 8080)
Port change via "conf/zeppelin-site.xml" to 8082 (Spark uses Port 8080)
在此安装步骤之后,我的笔记本使用了 S3 文件:
After this installation steps my notebook worked with S3 files:
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","xxx")
val file = "s3n://<<bucket>>/<<file>>"
val data = sc.textFile(file)
data.first
我认为在 Zeppelin 0.5.0 版本中没有完全解决 S3 问题,所以克隆实际的 git-repo 为我做了.
I think that the S3 problem is not resolved completely in Zeppelin Version 0.5.0, so cloning the actual git-repo did it for me.
重要信息:这项工作仅适用于使用 zeppelin spark-interpreter 设置 master=local[*](而不是使用 spark://master:7777)
Important Information: The job only worked for me with zeppelin spark-interpreter setting master=local[*] (instead of using spark://master:7777)
这篇关于为什么 Zeppelin notebook 无法连接到 S3的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!