Hadoop：提供目录作为MapReduce作业的输入 [英] Hadoop : Provide directory as input to MapReduce job

查看：276 发布时间：2018/5/31 18:35:19 java hadoop input mapreduce cloudera

本文介绍了Hadoop：提供目录作为MapReduce作业的输入的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Cloudera Hadoop。我能够运行简单的mapreduce程序，我提供一个文件作为MapReduce程序的输入。

该文件包含要由mapper函数处理的所有其他文件。 / p>

但是，我一直在坚持。

  / folder1 
  -  file1.txt 
  -  file2.txt 
  -  file3.txt

如何将MapReduce程序的输入路径指定为/ folder1，以便它可以开始处理该目录中的每个文件？

有什么想法？

编辑：

1）Intiailly，我提供了inputFile.txt作为mapreduce程序的输入。

 > inputFile.txt 
 file1.txt 
 file2.txt 
 file3.txt

2）但现在，我不想提供输入文件，而是想提供在命令行中输入目录为arg [0]。

  hadoop jar ABC.jar / folder1 / output

解决方案

问题是FileInputFormat不会在输入路径中递归读取文件dir 。

解决方案：使用下列代码

FileInputFormat $ b
FileInputFormat.addInputPath（job， new Path（args [0]））;
您可以检查这里是它修复的版本。 I'm using Cloudera Hadoop. I'm able to run simple mapreduce program where I provide a file as input to MapReduce program. This file contains all the other files to be processed by mapper function. But, I'm stuck at one point. /folder1 - file1.txt - file2.txt - file3.txt How can I specify the input path to MapReduce program as "/folder1", so that it can start processing each file inside that directory ? Any ideas ? EDIT : 1) Intiailly, I provided the inputFile.txt as input to mapreduce program. It was working perfectly. >inputFile.txt file1.txt file2.txt file3.txt 2) But now, instead of giving an input file, I want to provide with an input directory as arg[0] on command line. hadoop jar ABC.jar /folder1 /output 解决方案 The Problem is FileInputFormat doesn't read files recursively in the input path dir. Solution: Use Following code FileInputFormat.setInputDirRecursive(job, true); Before below line in your Map Reduce Code FileInputFormat.addInputPath(job, new Path(args[0])); You can check here for which version it was fixed. 这篇关于Hadoop：提供目录作为MapReduce作业的输入的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hadoop：提供目录作为MapReduce作业的输入 [英] Hadoop : Provide directory as input to MapReduce job

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Hadoop：提供目录作为MapReduce作业的输入 [英] Hadoop : Provide directory as input to MapReduce job

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭