在Hadoop MapReduce中可以有多个不同映射器的输入吗? [英] Is it possible to have multiple inputs with multiple different mappers in Hadoop MapReduce?

查看:116
本文介绍了在Hadoop MapReduce中可以有多个不同映射器的输入吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Hadoop MapReduce中可以有多个不同映射器的输入吗?每个映射器类都在不同的输入集上工作,但它们都会发出同一个reducer所使用的键值对。请注意,我不是在讨论在这里链接映射器,我在说的是并行地运行不同的映射器,而不是按顺序运行。

解决方案

这就是所谓的连接。



你想在mapred。*包中使用mappers和reducers(旧的,但仍然支持)。较新的包(mapreduce。*)只允许一个映射器输入。使用mapred软件包,您可以使用MultipleInputs类来定义连接:

  MultipleInputs.addInputPath(jobConf,
new Path(countsSource),
SequenceFileInputFormat.class,
CountMapper.class);
MultipleInputs.addInputPath(jobConf,
新路径(dictionarySource),
SomeOtherInputFormat.class,
TranslateMapper.class);

jobConf.setJarByClass(ReportJob.class);
jobConf.setReducerClass(WriteTextReducer.class);

jobConf.setMapOutputKeyClass(Text.class);
jobConf.setMapOutputValueClass(WordInfo.class);

jobConf.setOutputKeyClass(Text.class);
jobConf.setOutputValueClass(Text.class);


Is it possible to have multiple inputs with multiple different mappers in Hadoop MapReduce? Each mapper class work on a different set of inputs, but they would all emit key-value pairs consumed by the same reducer. Note that I'm not talking about chaining mappers here, I'm talking about running different mappers in parallel, not sequentially.

解决方案

This is called a join.

You want to use the mappers and reducers in the mapred.* packages (older, but still supported). The newer packages (mapreduce.*) only allow for one mapper input. With the mapred packages, you use the MultipleInputs class to define the join:

MultipleInputs.addInputPath(jobConf, 
                     new Path(countsSource),       
                     SequenceFileInputFormat.class, 
                     CountMapper.class);
MultipleInputs.addInputPath(jobConf, 
                     new Path(dictionarySource), 
                     SomeOtherInputFormat.class, 
                     TranslateMapper.class);

jobConf.setJarByClass(ReportJob.class);
jobConf.setReducerClass(WriteTextReducer.class);

jobConf.setMapOutputKeyClass(Text.class);
jobConf.setMapOutputValueClass(WordInfo.class);

jobConf.setOutputKeyClass(Text.class);
jobConf.setOutputValueClass(Text.class);

这篇关于在Hadoop MapReduce中可以有多个不同映射器的输入吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆