在 Hadoop MapReduce 中是否可以使用多个不同的映射器进行多个输入? [英] Is it possible to have multiple inputs with multiple different mappers in Hadoop MapReduce?
问题描述
Hadoop MapReduce 中是否可以有多个输入和多个不同的映射器?每个映射器类都在一组不同的输入上工作,但它们都会发出由同一个 reducer 消耗的键值对.请注意,这里我不是在讨论链接映射器,而是在讨论并行运行不同的映射器,而不是顺序运行.
Is it possible to have multiple inputs with multiple different mappers in Hadoop MapReduce? Each mapper class work on a different set of inputs, but they would all emit key-value pairs consumed by the same reducer. Note that I'm not talking about chaining mappers here, I'm talking about running different mappers in parallel, not sequentially.
推荐答案
这称为连接.
您想使用 mapred.* 包(较旧,但仍受支持)中的映射器和缩减器.较新的包 (mapreduce.*) 只允许一个映射器输入.使用 mapred 包,您可以使用 MultipleInputs 类来定义连接:
You want to use the mappers and reducers in the mapred.* packages (older, but still supported). The newer packages (mapreduce.*) only allow for one mapper input. With the mapred packages, you use the MultipleInputs class to define the join:
MultipleInputs.addInputPath(jobConf,
new Path(countsSource),
SequenceFileInputFormat.class,
CountMapper.class);
MultipleInputs.addInputPath(jobConf,
new Path(dictionarySource),
SomeOtherInputFormat.class,
TranslateMapper.class);
jobConf.setJarByClass(ReportJob.class);
jobConf.setReducerClass(WriteTextReducer.class);
jobConf.setMapOutputKeyClass(Text.class);
jobConf.setMapOutputValueClass(WordInfo.class);
jobConf.setOutputKeyClass(Text.class);
jobConf.setOutputValueClass(Text.class);
这篇关于在 Hadoop MapReduce 中是否可以使用多个不同的映射器进行多个输入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!