hadoop可以从多个目录和文件中获取输入 [英] Can hadoop take input from multiple directories and files

查看:252
本文介绍了hadoop可以从多个目录和文件中获取输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我将fileinputFormat设置为hadoop输入时。
arg [0] +/ * / * / *表示不匹配任何文件。



我想要从多个文件中读取,如下所示:

 
目录1
---目录11
- -Directory111
--f1.txt
--f2.txt
--- Directory12
Directory2
--- Directory21

是否可以在Hadoop中使用?
Thanks!

解决方案

您可以使用*****运算符从多个目录和文件获取输入。很有可能是因为arg [0]参数不正确,因此找不到文件。

另外,您也可以使用InputFormat.addInputPath,或者如果您需要单独的格式或映射器,可以使用MultipleInputs 类。



基本添加路径的示例

  FileInputFormat.addInputPath(job,myInputPath); 

以下是MultipleInputs的一个例子

  MultipleInputs.addInputPath(job,inputPath1,TextInputFormat.class,MyMapper.class); 
MultipleInputs.addInputPath(job,inputPath2,TextInputFormat.class,MyOtherMapper.class);

这个问题也非常相似,并且有很好的答案,Hadoop以减少多种输入格式

As I set the fileinputFormat as hadoop input. The arg[0]+"/*/*/*" said match no files.

what I want to is to read from multiple files as:

Directory1
---Directory11
   ---Directory111
        --f1.txt
        --f2.txt
---Directory12
Directory2
---Directory21

is it possible in Hadoop? Thanks!

解决方案

You can take input from multiple directories and files by using the ***** operator. Most likely it's because the "arg[0]" argument isn't correct and therefore it's not finding the files.

As an alternative, you can also use InputFormat.addInputPath or if you need separate formats or mappers the MultipleInputs class can be used.

Example of basic adding a path

FileInputFormat.addInputPath(job, myInputPath);

Here is an example of MultipleInputs

MultipleInputs.addInputPath(job, inputPath1, TextInputFormat.class, MyMapper.class);
MultipleInputs.addInputPath(job, inputPath2, TextInputFormat.class, MyOtherMapper.class);

This other question is also very similar and has good answers, Hadoop to reduce from multiple input formats.

这篇关于hadoop可以从多个目录和文件中获取输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆