如何从本地 Hadoop 2.6 安装访问 S3/S3n? [英] How can I access S3/S3n from a local Hadoop 2.6 installation?
问题描述
我正在尝试在我的本地机器上重现 Amazon EMR 集群.为此,我已经安装了 截至目前最新稳定版本的 Hadoop- 2.6.0.现在我想访问一个 S3 存储桶,就像我在 EMR 集群中所做的那样.
我已在 core-site.xml 中添加了 aws 凭据:
<name>fs.s3.awsAccessKeyId</name><value>some id</value></属性><财产><name>fs.s3n.awsAccessKeyId</name><value>some id</value></属性><财产><name>fs.s3.awsSecretAccessKey</name><value>某个键</value></属性><财产><name>fs.s3n.awsSecretAccessKey</name><value>某个键</value></属性>
注意:由于键上有一些斜线,我用 %2F 将它们转义
如果我尝试列出存储桶的内容:
hadoop fs -ls s3://some-url/bucket/
我收到此错误:
ls:s3 没有文件系统
我再次编辑了core-site.xml,并添加了与fs相关的信息:
<name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3.S3FileSystem</value></属性><财产><name>fs.s3n.impl</name><value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value></属性>
这次我得到了一个不同的错误:
-ls:致命的内部错误java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3.S3FileSystem 未找到在 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)在 org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)在 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
不知何故,我怀疑 Yarn 发行版没有能够读取 S3 所需的 jars,但我不知道从哪里获得它们.任何在这个方向上的指针将不胜感激.
出于某种原因,包含 NativeS3FileSystem
实现的 jar hadoop-aws-[version].jar
在 2.6 版本中,默认情况下,hadoop 的 classpath
中不存在 code> &2.7.因此,尝试通过在位于 $HADOOP_HOME/etc/hadoop/hadoop-env.sh
的 hadoop-env.sh
中添加以下行来将其添加到类路径中:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*
<块引用>
假设您使用的是 Apache Hadoop 2.6 或 2.7
顺便说一下,您可以使用以下方法检查 Hadoop 的类路径:
bin/hadoop 类路径
I am trying to reproduce an Amazon EMR cluster on my local machine. For that purpose, I have installed the latest stable version of Hadoop as of now - 2.6.0. Now I would like to access an S3 bucket, as I do inside the EMR cluster.
I have added the aws credentials in core-site.xml:
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>some id</value>
</property>
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>some id</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>some key</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>some key</value>
</property>
Note: Since there are some slashes on the key, I have escaped them with %2F
If I try to list the contents of the bucket:
hadoop fs -ls s3://some-url/bucket/
I get this error:
ls: No FileSystem for scheme: s3
I edited core-site.xml again, and added information related to the fs:
<property>
<name>fs.s3.impl</name>
<value>org.apache.hadoop.fs.s3.S3FileSystem</value>
</property>
<property>
<name>fs.s3n.impl</name>
<value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
</property>
This time I get a different error:
-ls: Fatal internal error
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3.S3FileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
Somehow I suspect the Yarn distribution does not have the necessary jars to be able to read S3, but I have no idea where to get those. Any pointers in this direction would be greatly appreciated.
For some reason, the jar hadoop-aws-[version].jar
which contains the implementation to NativeS3FileSystem
is not present in the classpath
of hadoop by default in the version 2.6 & 2.7. So, try and add it to the classpath by adding the following line in hadoop-env.sh
which is located in $HADOOP_HOME/etc/hadoop/hadoop-env.sh
:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*
Assuming you are using Apache Hadoop 2.6 or 2.7
By the way, you could check the classpath of Hadoop using:
bin/hadoop classpath
这篇关于如何从本地 Hadoop 2.6 安装访问 S3/S3n?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!