与复制的本地数据到HDFS使用Amazon EC2 / S3 Hadoop集群上的问题 [英] Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3

查看:422
本文介绍了与复制的本地数据到HDFS使用Amazon EC2 / S3 Hadoop集群上的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经安装包含在Amazon EC2上5个节点的Hadoop集群。现在,当我登录到主节点,并提交以下命令

I have setup a Hadoop cluster containing 5 nodes on Amazon EC2. Now, when i login into the Master node and submit the following command

bin/hadoop jar <program>.jar <arg1> <arg2> <path/to/input/file/on/S3>

这引发以下错误的第一个错误被抛出时,我不取代%2F,第二斜线被抛出时,我用%2F替换它们(而不是在同一时间。):

It throws the following errors (not at the same time.) The first error is thrown when i don't replace the slashes with '%2F' and the second is thrown when i replace them with '%2F':

1) Java.lang.IllegalArgumentException: Invalid hostname in URI S3://<ID>:<SECRETKEY>@<BUCKET>/<path-to-inputfile>
2) org.apache.hadoop.fs.S3.S3Exception: org.jets3t.service.S3ServiceException: S3 PUT failed for '/' XML Error Message: The request signature we calculated does not match the signature you provided. check your key and signing method.

请注意:

1),当我提交了JPS,看看有什么任务都在主服务器上运行,它只是显示

1)when i submitted jps to see what tasks were running on the Master, it just showed

1116 NameNode
1699 Jps
1180 JobTracker

留下的DataNode和的TaskTracker。

leaving DataNode and TaskTracker.

2)我的秘密密钥包含两个'/'(斜杠)。和我一起%2F在S3 URI取代它们。

2)My Secret key contains two '/' (forward slashes). And i replace them with '%2F' in the S3 URI.

PS:在一个节点上运行时,程序正常运行在EC2上。它只有当我启动群集,我碰上涉及到将数据复制到/从S3自/至HDFS的问题。而且,什么DistCp使用呢?我是否需要分发的数据,即使我从S3将数据复制到HDFS?(我想,HDFS注意到了照顾内部)

PS: The program runs fine on EC2 when run on a single node. Its only when i launch a cluster, i run into issues related to copying data to/from S3 from/to HDFS. And, what does distcp do? Do i need to distribute the data even after i copy the data from S3 to HDFS?(I thought, HDFS took care of that internally)

如果你可以直接我一个解释运行地图的链接/减少使用Amazon EC2 / S3 Hadoop集群上的程序。这将是巨大的。

IF you could direct me to a link that explains running Map/reduce programs on a hadoop cluster using Amazon EC2/S3. That would be great.

问候,

迪帕克。

推荐答案

您也可以在您的阿帕奇呼呼的这个工作流程。检查快速入门指南和的5分钟指南获取更多信息。

You can also you Apache Whirr for this workflow. Check the Quick Start Guide and the 5 minutes guide for more info.

免责声明:我是提交者之一。

Disclaimer: I'm one of the committers.

这篇关于与复制的本地数据到HDFS使用Amazon EC2 / S3 Hadoop集群上的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆