hadoop 从 hdfs 复制到 S3 [英] hadoop copying from hdfs to S3

查看:58
本文介绍了hadoop 从 hdfs 复制到 S3的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在 Amazon EMR 上成功完成了 mahout 矢量化工作(使用 在 Elastic MapReduce 上的 Mahout 作为参考).现在我想将结果从 HDFS 复制到 S3(以便在将来的集群中使用它).

I've successfully completed mahout vectorizing job on Amazon EMR (using Mahout on Elastic MapReduce as reference). Now I want to copy results from HDFS to S3 (to use it in future clustering).

For that I've used hadoop distcp:

den@aws:~$ elastic-mapreduce --jar s3://elasticmapreduce/samples/distcp/distcp.jar 
> --arg hdfs://my.bucket/prj1/seqfiles 
> --arg s3n://ACCESS_KEY:SECRET_KEY@my.bucket/prj1/seqfiles 
> -j $JOBID

失败.发现这个建议:使用 s3distcp 也尝试过:

Failed. Found that suggestion: Use s3distcp Tried it also:

elastic-mapreduce --jobflow $JOBID 
> --jar --arg s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar 
> --arg --s3Endpoint --arg 's3-eu-west-1.amazonaws.com' 
> --arg --src --arg 'hdfs://my.bucket/prj1/seqfiles' 
> --arg --dest --arg 's3://my.bucket/prj1/seqfiles'

在这两种情况下,我都有相同的错误:java.net.UnknownHostException: unknown host: my.bucket
低于第二种情况的完整错误输出.

In both cases I have the same error: java.net.UnknownHostException: unknown host: my.bucket
Below the full error output for the 2nd case.

2012-09-06 13:25:08,209 FATAL com.amazon.external.elasticmapreduce.s3distcp.S3DistCp (main): Failed to get source file system
java.net.UnknownHostException: unknown host: my.bucket
    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1193)
    at org.apache.hadoop.ipc.Client.call(Client.java:1047)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:127)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:249)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:214)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1413)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:68)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1431)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:256)
    at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:431)
    at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:216)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at com.amazon.external.elasticmapreduce.s3distcp.Main.main(Main.java:12)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

推荐答案

我发现了一个错误:

  1. 主要问题不是

  1. The main problem is not

java.net.UnknownHostException:未知主机:my.bucket

java.net.UnknownHostException: unknown host: my.bucket

但是:

2012-09-06 13:27:33,909 FATAL com.amazon.external.elasticmapreduce.s3distcp.S3DistCp (main): Failed to get source file system

所以.在源路径中再添加 1 个斜杠后 - 作业开始没有问题.正确的命令是:

So. After adding 1 more slash in source path - job was started without problems. Correct command is:

elastic-mapreduce --jobflow $JOBID 
> --jar --arg s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar 
> --arg --s3Endpoint --arg 's3-eu-west-1.amazonaws.com' 
> --arg --src --arg 'hdfs:///my.bucket/prj1/seqfiles' 
> --arg --dest --arg 's3://my.bucket/prj1/seqfiles'

附言所以.这是工作.作业正确完成.我已经成功复制了 30Gb 文件的目录.

P.S. So. it is working. Job is correctly finished. I've successfully copied dir with 30Gb file.

这篇关于hadoop 从 hdfs 复制到 S3的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆