hadoop可以从多个目录和文件中获取输入 [英] Can hadoop take input from multiple directories and files
问题描述
当我将fileinputFormat设置为hadoop输入时。
arg [0] +/ * / * / *
表示不匹配任何文件。
我想要从多个文件中读取,如下所示:
目录1
---目录11
- -Directory111
--f1.txt
--f2.txt
--- Directory12
Directory2
--- Directory21
是否可以在Hadoop中使用?
Thanks!
您可以使用*****运算符从多个目录和文件获取输入。很有可能是因为arg [0]参数不正确,因此找不到文件。
另外,您也可以使用InputFormat.addInputPath,或者如果您需要单独的格式或映射器,可以使用MultipleInputs 类。
基本添加路径的示例
FileInputFormat.addInputPath(job,myInputPath);
以下是MultipleInputs的一个例子
MultipleInputs.addInputPath(job,inputPath1,TextInputFormat.class,MyMapper.class);
MultipleInputs.addInputPath(job,inputPath2,TextInputFormat.class,MyOtherMapper.class);
这个问题也非常相似,并且有很好的答案,Hadoop以减少多种输入格式。
As I set the fileinputFormat as hadoop input.
The arg[0]+"/*/*/*"
said match no files.
what I want to is to read from multiple files as:
Directory1 ---Directory11 ---Directory111 --f1.txt --f2.txt ---Directory12 Directory2 ---Directory21
is it possible in Hadoop? Thanks!
You can take input from multiple directories and files by using the ***** operator. Most likely it's because the "arg[0]" argument isn't correct and therefore it's not finding the files.
As an alternative, you can also use InputFormat.addInputPath or if you need separate formats or mappers the MultipleInputs class can be used.
Example of basic adding a path
FileInputFormat.addInputPath(job, myInputPath);
Here is an example of MultipleInputs
MultipleInputs.addInputPath(job, inputPath1, TextInputFormat.class, MyMapper.class);
MultipleInputs.addInputPath(job, inputPath2, TextInputFormat.class, MyOtherMapper.class);
This other question is also very similar and has good answers, Hadoop to reduce from multiple input formats.
这篇关于hadoop可以从多个目录和文件中获取输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!