使用 Amazon EC2/S3 将本地数据复制到 Hadoop 集群上的 HDFS 的问题 [英] Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3

查看:36
本文介绍了使用 Amazon EC2/S3 将本地数据复制到 Hadoop 集群上的 HDFS 的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Amazon EC2 上设置了一个包含 5 个节点的 Hadoop 集群.现在,当我登录主节点并提交以下命令时

I have setup a Hadoop cluster containing 5 nodes on Amazon EC2. Now, when i login into the Master node and submit the following command

bin/hadoop jar <program>.jar <arg1> <arg2> <path/to/input/file/on/S3>

它抛出以下错误(不是同时).当我不将斜杠替换为%2F"时抛出第一个错误,当我将它们替换为%2F"时抛出第二个错误:

It throws the following errors (not at the same time.) The first error is thrown when i don't replace the slashes with '%2F' and the second is thrown when i replace them with '%2F':

1) Java.lang.IllegalArgumentException: Invalid hostname in URI S3://<ID>:<SECRETKEY>@<BUCKET>/<path-to-inputfile>
2) org.apache.hadoop.fs.S3.S3Exception: org.jets3t.service.S3ServiceException: S3 PUT failed for '/' XML Error Message: The request signature we calculated does not match the signature you provided. check your key and signing method.

注意:

1) 当我提交 jps 以查看 Master 上正在运行哪些任务时,它只是显示

1)when i submitted jps to see what tasks were running on the Master, it just showed

1116 NameNode
1699 Jps
1180 JobTracker

离开 DataNode 和 TaskTracker.

leaving DataNode and TaskTracker.

2)我的密钥包含两个/"(正斜杠).我将它们替换为 S3 URI 中的%2F".

2)My Secret key contains two '/' (forward slashes). And i replace them with '%2F' in the S3 URI.

PS:当在单个节点上运行时,程序在 EC2 上运行良好.只有当我启动集群时,我才会遇到与将数据复制到 S3 或从 HDFS 复制到 HDFS 相关的问题.还有,distcp 有什么作用?即使在将数据从 S3 复制到 HDFS 之后,我还需要分发数据吗?(我认为 HDFS 在内部处理了这个问题)

PS: The program runs fine on EC2 when run on a single node. Its only when i launch a cluster, i run into issues related to copying data to/from S3 from/to HDFS. And, what does distcp do? Do i need to distribute the data even after i copy the data from S3 to HDFS?(I thought, HDFS took care of that internally)

如果您可以将我指向一个链接,该链接解释了如何使用 Amazon EC2/S3 在 hadoop 集群上运行 Map/reduce 程序.那太好了.

IF you could direct me to a link that explains running Map/reduce programs on a hadoop cluster using Amazon EC2/S3. That would be great.

问候,

Deepak.

推荐答案

你也可以Apache Whirr 用于此工作流程.查看快速入门指南5 分钟指南 了解更多信息.

You can also you Apache Whirr for this workflow. Check the Quick Start Guide and the 5 minutes guide for more info.

免责声明:我是提交者之一.

Disclaimer: I'm one of the committers.

这篇关于使用 Amazon EC2/S3 将本地数据复制到 Hadoop 集群上的 HDFS 的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆