Hadoop:提供目录作为 MapReduce 作业的输入 [英] Hadoop : Provide directory as input to MapReduce job

查看：27 发布时间：2021/12/15 19:16:13 java hadoop input mapreduce cloudera

本文介绍了Hadoop:提供目录作为 MapReduce 作业的输入的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我使用的是 Cloudera Hadoop.我能够运行简单的 mapreduce 程序，在其中我提供一个文件作为 MapReduce 程序的输入.

I'm using Cloudera Hadoop. I'm able to run simple mapreduce program where I provide a file as input to MapReduce program.

此文件包含所有其他要由映射器函数处理的文件.

This file contains all the other files to be processed by mapper function.

但是，我有一点被卡住了.

But, I'm stuck at one point.

/folder1
  - file1.txt
  - file2.txt
  - file3.txt

如何将 MapReduce 程序的输入路径指定为 "/folder1"，以便它可以开始处理该目录中的每个文件?

How can I specify the input path to MapReduce program as "/folder1", so that it can start processing each file inside that directory ?

有什么想法吗?

1) 首先，我提供了 inputFile.txt 作为 mapreduce 程序的输入.它运行良好.

1) Intiailly, I provided the inputFile.txt as input to mapreduce program. It was working perfectly.

>inputFile.txt
file1.txt
file2.txt
file3.txt

2) 但是现在，我想在命令行上提供一个输入目录作为 arg[0]，而不是提供输入文件.

2) But now, instead of giving an input file, I want to provide with an input directory as arg[0] on command line.

hadoop jar ABC.jar /folder1 /output