如何从本地 Hadoop 2.6 安装访问 S3/S3n? [英] How can I access S3/S3n from a local Hadoop 2.6 installation?

查看:59
本文介绍了如何从本地 Hadoop 2.6 安装访问 S3/S3n?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在我的本地机器上重现 Amazon EMR 集群.为此,我已经安装了 截至目前最新稳定版本的 Hadoop- 2.6.0.现在我想访问一个 S3 存储桶,就像我在 EMR 集群中所做的那样.

我已在 core-site.xml 中添加了 aws 凭据:

<name>fs.s3.awsAccessKeyId</name><value>some id</value></属性><财产><name>fs.s3n.awsAccessKeyId</name><value>some id</value></属性><财产><name>fs.s3.awsSecretAccessKey</name><value>某个键</value></属性><财产><name>fs.s3n.awsSecretAccessKey</name><value>某个键</value></属性>

注意:由于键上有一些斜线,我用 %2F 将它们转义

如果我尝试列出存储桶的内容:

hadoop fs -ls s3://some-url/bucket/

我收到此错误:

ls:s3 没有文件系统

我再次编辑了core-site.xml,并添加了与fs相关的信息:

<name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3.S3FileSystem</value></属性><财产><name>fs.s3n.impl</name><value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value></属性>

这次我得到了一个不同的错误:

-ls:致命的内部错误java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3.S3FileSystem 未找到在 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)在 org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)在 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

不知何故,我怀疑 Yarn 发行版没有能够读取 S3 所需的 jars,但我不知道从哪里获得它们.任何在这个方向上的指针将不胜感激.

解决方案

出于某种原因,包含 NativeS3FileSystem 实现的 jar hadoop-aws-[version].jar在 2.6 版本中,默认情况下,hadoop 的 classpath 中不存在 code> &2.7.因此,尝试通过在位于 $HADOOP_HOME/etc/hadoop/hadoop-env.shhadoop-env.sh 中添加以下行来将其添加到类路径中:

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*

<块引用>

假设您使用的是 Apache Hadoop 2.6 或 2.7

顺便说一下,您可以使用以下方法检查 Hadoop 的类路径:

bin/hadoop 类路径

I am trying to reproduce an Amazon EMR cluster on my local machine. For that purpose, I have installed the latest stable version of Hadoop as of now - 2.6.0. Now I would like to access an S3 bucket, as I do inside the EMR cluster.

I have added the aws credentials in core-site.xml:

<property>
  <name>fs.s3.awsAccessKeyId</name>
  <value>some id</value>
</property>

<property>
  <name>fs.s3n.awsAccessKeyId</name>
  <value>some id</value>
</property>

<property>
  <name>fs.s3.awsSecretAccessKey</name>
  <value>some key</value>
</property>

<property>
  <name>fs.s3n.awsSecretAccessKey</name>
  <value>some key</value>
</property>

Note: Since there are some slashes on the key, I have escaped them with %2F

If I try to list the contents of the bucket:

hadoop fs -ls s3://some-url/bucket/

I get this error:

ls: No FileSystem for scheme: s3

I edited core-site.xml again, and added information related to the fs:

<property>
  <name>fs.s3.impl</name>
  <value>org.apache.hadoop.fs.s3.S3FileSystem</value>
</property>

<property>
  <name>fs.s3n.impl</name>
  <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
</property>

This time I get a different error:

-ls: Fatal internal error
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3.S3FileSystem not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Somehow I suspect the Yarn distribution does not have the necessary jars to be able to read S3, but I have no idea where to get those. Any pointers in this direction would be greatly appreciated.

解决方案

For some reason, the jar hadoop-aws-[version].jar which contains the implementation to NativeS3FileSystem is not present in the classpath of hadoop by default in the version 2.6 & 2.7. So, try and add it to the classpath by adding the following line in hadoop-env.sh which is located in $HADOOP_HOME/etc/hadoop/hadoop-env.sh:

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*

Assuming you are using Apache Hadoop 2.6 or 2.7

By the way, you could check the classpath of Hadoop using:

bin/hadoop classpath

这篇关于如何从本地 Hadoop 2.6 安装访问 S3/S3n?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆