Hadoop,MapReduce - 多输入/输出路径 [英] Hadoop, MapReduce - Multiple Input/Output Paths

查看:536
本文介绍了Hadoop,MapReduce - 多输入/输出路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在为我的MapReduce作业创建Jar时,在我的输入文件中,我使用Hadoop-local命令。我想知道是否有办法,而不是专门为我的输入文件夹中的每个文件指定要用于MapReduce作业的路径,而不管我是否可以指定并传递来自我的输入文件夹的所有文件。这是因为我想要配置的MapReduce作业的性质可能会改变文件的内容和数量,而且除了这些文件的内容之外,我不知道文件的具体数量,有没有办法将输入文件夹中的所有文件传递到我的MapReduce程序中,然后迭代每个文件以计算某个函数,然后将结果发送给Reducer。我只使用一个Map / Reduce程序,而且我使用Java进行编码。我可以使用hadoop-moonshot命令,但我现在正在使用hadoop-local。



谢谢。

MapReduce 作业输入单个文件作为输入。



FileInputFormat 类已经提供API来接受多个文件的列表作为Map Reduce程序的输入。

  public static void setInputPaths(Job job,
Path ... inputPaths)
throws IOException




为map-reduce作业的输入列表添加路径。
参数:

conf - 作业配置

path - 要添加到的路径map-reduce作业的输入列表。

来自Apache的示例代码教程

  Job job = Job.getInstance(conf,word count); 
FileInputFormat.addInputPath(job,new Path(args [0]));

MultipleInputs 提供了以下API。

  public static void addInputPath(Job job,
Path path,
Class< ;? extends InputFormat> inputFormatClass,
Class< ;? extends Mapper> mapperClass)




将具有自定义InputFormat和Mapper的路径添加到map-reduce的输入列表中

相关SE问题:





请参阅

  FileOutputFormat.setOutputPath(job,outDir ); 

//为作业
定义另外的单个基于文本的输出'文本'MultipleOutputs.addNamedOutput(job,text,TextOutputFormat.class,
LongWritable.class,Text。类);

//为作业
定义附加的基于序列文件的输出'序列'MultipleOutputs.addNamedOutput(job,seq,
SequenceFileOutputFormat.class,
LongWritable .class,Text.class);

查看关于多个输出文件的相关SE问题。



写入hadoop中的多个文件夹?



hadoop方法发送输出到多个目录


In my input file when making the Jar for my MapReduce Job, I am using the Hadoop-local command. I wanted to know whether there was a way of, instead of specifically specifying the path for each file in my input folder to be used in the MapReduce job, whether I could just specify and pass all the files from my input folder. This is because the contents and number of files could change due to the nature of the MapReduce job I am trying to configure and as I do not know the specific amount of files, apart from just the contents of these files, is there a way to pass all files from the input folder into my MapReduce program and then iterate over each file to compute a certain function which would then send the results to the Reducer. I am only using one Map/Reduce program and I am coding in Java. I am able to use the hadoop-moonshot command, but I am working with hadoop-local at the moment.

Thanks.

解决方案

You don't have to pass individual file as input for MapReduce Job.

FileInputFormat class already provides API to accept list of multiple files as Input to Map Reduce program.

public static void setInputPaths(Job job,
                 Path... inputPaths)
                          throws IOException

Add a Path to the list of inputs for the map-reduce job. Parameters:

conf - The configuration of the job

path - Path to be added to the list of inputs for the map-reduce job.

Example code from Apache tutorial

Job job = Job.getInstance(conf, "word count");
FileInputFormat.addInputPath(job, new Path(args[0]));

MultipleInputs provides below APIs.

public static void addInputPath(Job job,
                Path path,
                Class<? extends InputFormat> inputFormatClass,
                Class<? extends Mapper> mapperClass)

Add a Path with a custom InputFormat and Mapper to the list of inputs for the map-reduce job.

Related SE question:

Can hadoop take input from multiple directories and files

Refer to MultipleOutputs API regarding your second query on multiple output paths.

FileOutputFormat.setOutputPath(job, outDir);

// Defines additional single text based output 'text' for the job
MultipleOutputs.addNamedOutput(job, "text", TextOutputFormat.class,
LongWritable.class, Text.class);

// Defines additional sequence-file based output 'sequence' for the job
MultipleOutputs.addNamedOutput(job, "seq",
SequenceFileOutputFormat.class,
LongWritable.class, Text.class);

Have a look at related SE questions regarding multiple output files.

Writing to multiple folders in hadoop?

hadoop method to send output to multiple directories

这篇关于Hadoop,MapReduce - 多输入/输出路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆