使用 Amazon EC2/S3 将本地数据复制到 Hadoop 集群上的 HDFS 的问题 [英] Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3

查看：36 发布时间：2021/12/21 11:19:07 amazon-s3 amazon-ec2 hadoop cloud hdfs

本文介绍了使用 Amazon EC2/S3 将本地数据复制到 Hadoop 集群上的 HDFS 的问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 Amazon EC2 上设置了一个包含 5 个节点的 Hadoop 集群.现在，当我登录主节点并提交以下命令时

I have setup a Hadoop cluster containing 5 nodes on Amazon EC2. Now, when i login into the Master node and submit the following command

bin/hadoop jar <program>.jar <arg1> <arg2> <path/to/input/file/on/S3>

它抛出以下错误(不是同时).当我不将斜杠替换为%2F"时抛出第一个错误，当我将它们替换为%2F"时抛出第二个错误:

It throws the following errors (not at the same time.) The first error is thrown when i don't replace the slashes with '%2F' and the second is thrown when i replace them with '%2F':

1) Java.lang.IllegalArgumentException: Invalid hostname in URI S3://<ID>:<SECRETKEY>@<BUCKET>/<path-to-inputfile>
2) org.apache.hadoop.fs.S3.S3Exception: org.jets3t.service.S3ServiceException: S3 PUT failed for '/' XML Error Message: The request signature we calculated does not match the signature you provided. check your key and signing method.

注意:

1) 当我提交 jps 以查看 Master 上正在运行哪些任务时，它只是显示

1)when i submitted jps to see what tasks were running on the Master, it just showed

1116 NameNode
1699 Jps
1180 JobTracker

离开 DataNode 和 TaskTracker.

leaving DataNode and TaskTracker.

2)我的密钥包含两个/"(正斜杠).我将它们替换为 S3 URI 中的%2F".

2)My Secret key contains two '/' (forward slashes). And i replace them with '%2F' in the S3 URI.

PS:当在单个节点上运行时，程序在 EC2 上运行良好.只有当我启动集群时，我才会遇到与将数据复制到 S3 或从 HDFS 复制到 HDFS 相关的问题.还有，distcp 有什么作用?即使在将数据从 S3 复制到 HDFS 之后，我还需要分发数据吗?(我认为 HDFS 在内部处理了这个问题)

PS: The program runs fine on EC2 when run on a single node. Its only when i launch a cluster, i run into issues related to copying data to/from S3 from/to HDFS. And, what does distcp do? Do i need to distribute the data even after i copy the data from S3 to HDFS?(I thought, HDFS took care of that internally)

如果您可以将我指向一个链接，该链接解释了如何使用 Amazon EC2/S3 在 hadoop 集群上运行 Map/reduce 程序.那太好了.

IF you could direct me to a link that explains running Map/reduce programs on a hadoop cluster using Amazon EC2/S3. That would be great.

问候，

Deepak.

使用 Amazon EC2/S3 将本地数据复制到 Hadoop 集群上的 HDFS 的问题 [英] Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 Amazon EC2/S3 将本地数据复制到 Hadoop 集群上的 HDFS 的问题 [英] Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭