Hadoop MapReduce:可以在一个hadoop作业类中定义两个mappers和reducer? [英] Hadoop MapReduce: Possible to define two mappers and reducers in one hadoop job class?

查看:637
本文介绍了Hadoop MapReduce:可以在一个hadoop作业类中定义两个mappers和reducer?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个独立的java类来完成两个不同的mapreduce作业。我可以独立运行它们。两个作业所使用的输入文件是相同的。所以我的问题是,是否可以在一个java类中定义两个映射器和两个reducer,如

  mapper1.class 
mapper2.class
reducer1.class
reducer2.class

然后像

  job.setMapperClass(mapper1.class); 
job.setmapperClass(mapper2.class);
job.setCombinerClass(reducer1);
job.setCombinerClass(reducer2);
job.setReducerClass(reducer1);
job.setReducerClass(reducer2);

这些设置方法是否实际覆盖了以前的方法或添加了新方法?我尝试了代码,但是它执行了唯一的最新的给定的类,这使我认为它覆盖了。但是必须有这样做的权利吗?

我之所以这样问是因为我只能读取一次输入文件(一个I / O),然后处理两个地图缩小作业。我也想知道如何将输出文件写入两个不同的文件夹。目前,这两项工作是分开的,需要一个输入和一个输出目录。 一份工作,你只能有一个减速器。您需要的功能是 MultipleInput MultipleOutput GenericWritable

使用 MultipleInput ,您可以设置映射器和相应的inputFormat。以下是我的文章,介绍如何使用它。



使用 GenericWritable ,可以在reducer中分隔不同的输入类。这是我的文章关于如何使用它。



使用 MultipleOutput ,可以在同一个reducer中输出不同的类。


I have two separate java classes for doing two different mapreduce jobs. I can run them independently. The input files on which they are operating are the same for both of the jobs. So my question is whether it is possible to define two mappers and two reducers in one java class like

mapper1.class
mapper2.class
reducer1.class
reducer2.class

and then like

job.setMapperClass(mapper1.class);
job.setmapperClass(mapper2.class);
job.setCombinerClass(reducer1);
job.setCombinerClass(reducer2);
job.setReducerClass(reducer1);
job.setReducerClass(reducer2);

Do these set Methods actually override the previous ones or add the new ones? I tried the code, but it executes the only latest given classes which brings me thinking that it overrides. But there must be a way of doing this right?

The reason why I am asking this is I can read the input files only once (one I/O) and then process two map reduce jobs. I also would like to know how I can write the output files into two different folders. At the moment, both jobs are separate and require an input and an output directory.

解决方案

You can have multiple mappers, but in one job, you can only have one reducer. And the features you need are MultipleInput, MultipleOutput and GenericWritable.

Using MultipleInput, you can set the mapper and the corresponding inputFormat. Here is my post about how to use it.

Using GenericWritable, you can separate different input classes in the reducer. Here is my post about how to use it.

Using MultipleOutput, you can output different classes in the same reducer.

这篇关于Hadoop MapReduce:可以在一个hadoop作业类中定义两个mappers和reducer?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆