Hadoop MapReduce:可以在一个 hadoop 作业类中定义两个映射器和化简器吗? [英] Hadoop MapReduce: Possible to define two mappers and reducers in one hadoop job class?

查看:21
本文介绍了Hadoop MapReduce:可以在一个 hadoop 作业类中定义两个映射器和化简器吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个独立的 java 类来执行两个不同的 mapreduce 作业.我可以独立运行它们.他们操作的输入文件对于这两个作业是相同的.所以我的问题是是否可以在一个java类中定义两个映射器和两个reducer,比如

I have two separate java classes for doing two different mapreduce jobs. I can run them independently. The input files on which they are operating are the same for both of the jobs. So my question is whether it is possible to define two mappers and two reducers in one java class like

mapper1.class
mapper2.class
reducer1.class
reducer2.class

然后喜欢

job.setMapperClass(mapper1.class);
job.setmapperClass(mapper2.class);
job.setCombinerClass(reducer1);
job.setCombinerClass(reducer2);
job.setReducerClass(reducer1);
job.setReducerClass(reducer2);

这些设置方法实际上是覆盖以前的方法还是添加新的方法?我尝试了代码,但它执行了唯一最新的给定类,这让我认为它会覆盖.但一定有办法做到这一点吗?

Do these set Methods actually override the previous ones or add the new ones? I tried the code, but it executes the only latest given classes which brings me thinking that it overrides. But there must be a way of doing this right?

我问这个的原因是我只能读取一次输入文件(一个 I/O),然后处理两个 map reduce 作业.我也想知道如何将输出文件写入两个不同的文件夹.目前,这两个作业是独立的,需要一个输入和一个输出目录.

The reason why I am asking this is I can read the input files only once (one I/O) and then process two map reduce jobs. I also would like to know how I can write the output files into two different folders. At the moment, both jobs are separate and require an input and an output directory.

推荐答案

你可以有多个mapper,但是在一个job中你只能有一个reducer.您需要的功能是MultipleInputMultipleOutputGenericWritable.

You can have multiple mappers, but in one job, you can only have one reducer. And the features you need are MultipleInput, MultipleOutput and GenericWritable.

使用MultipleInput,可以设置mapper和对应的inputFormat.这是我关于如何使用它的帖子.

Using MultipleInput, you can set the mapper and the corresponding inputFormat. Here is my post about how to use it.

使用GenericWritable,你可以在reducer中分离不同的输入类.这是我关于如何使用它的帖子.

Using GenericWritable, you can separate different input classes in the reducer. Here is my post about how to use it.

使用MultipleOutput,可以在同一个reducer中输出不同的类.

Using MultipleOutput, you can output different classes in the same reducer.

这篇关于Hadoop MapReduce:可以在一个 hadoop 作业类中定义两个映射器和化简器吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆