使用FileInputFormat.addInputPaths递归添加HDFS路径 [英] Using FileInputFormat.addInputPaths to recursively add HDFS path
问题描述
a / b / file1.gz
a / b / file2.gz
a / c / file3.gz
a / c / file4.gz
我使用的经典模式是
FileInputFormat.addInputPaths(conf,args [0]);
设置我的java map reduce作业的输入路径。
如果我指定 args [0] 为 a / b ,但它会失败,如果我只指定 a (我的意图是处理所有4个文件)
错误是
Exception in threadmainjava.io.IOException:不是文件:hdfs:// host:9000 / user / hadoop / a
如何在 a 下递归添加所有内容?
我必须缺少一些简单的内容...
这是当前版本Hadoop中的一个错误。同样,这里是 JIRA 。它仍处于开放状态。在代码中进行更改并构建二进制文件或等待它在即将发布的版本中得到修复。递归处理文件可以打开/关闭,查看连接到JIRA的补丁以获得更多细节。
I've got a HDFS structure something like
a/b/file1.gz
a/b/file2.gz
a/c/file3.gz
a/c/file4.gz
I'm using the classic pattern of
FileInputFormat.addInputPaths(conf, args[0]);
to set my input path for a java map reduce job.
This works fine if I specify args[0] as a/b but it fails if I specify just a (my intention being to process all 4 files)
the error being
Exception in thread "main" java.io.IOException: Not a file: hdfs://host:9000/user/hadoop/a
How do I recursively add everything under a ?
I must be missing something simple...
This is a bug in the current version of Hadoop. Here is the JIRA for the same. It's still in open state. Either make the changes in the code and build the binaries or wait for it to be fixed in the coming releases. Processing of the files recursively can be turned on/off, check the patch attached to the JIRA for more details.
这篇关于使用FileInputFormat.addInputPaths递归添加HDFS路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!