在Hadoop MapReduce作业中链接多重缩减器 [英] Chaining Multi-Reducers in a Hadoop MapReduce job

查看：111 发布时间：2018/5/31 19:29:16 java hadoop mapreduce

本文介绍了在Hadoop MapReduce作业中链接多重缩减器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

 输入 - > Map1  - >减少1  - > Reducer2  - >减少3  - >减少4  - >输出

我注意到有 ChainMapper 类在Hadoop中可以将多个映射器链接成一个大映射器，并在映射阶段之间保存磁盘I / O成本。还有一个 ChainReducer 类，但它不是一个真正的链减速器。它只能支持这样的工作：

  [Map + / Reduce Map *]

我知道我可以为我的任务设置四个MR作业，并为最后三个作业使用默认的映射器。但是这会花费大量的磁盘I / O，因为缩减器应该将结果写入磁盘以让后续的映射器访问它。是否还有其他Hadoop内置功能来链接我的Reducer以降低I / O成本？

我正在使用Hadoop 1.0.4。
解决方案
我不认为您可以将reducer的o / p直接发给另一个reducer 。我会这样做：
Input-> Map1 - >减少1 - > 身份映射器 - > Reducer2 - > 身份映射器 - >减少3 - > 身份映射器 - >减少4 - >输出
在Hadoop 2.X系列中，您可以在内部使用ChainMapper和链映射器在Reducer之前链接映射器减速器与 ChainReducer 。

Now I have a 4-phase MapReduce job as follows:
Input-> Map1 -> Reduce1 -> Reducer2 -> Reduce3 -> Reduce4 -> Output
I notice that there is ChainMapper class in Hadoop which can chain several mappers into one big mapper, and save the disk I/O cost between map phases. There is also a ChainReducer class, however it is not a real "Chain-Reducer". It can only support jobs like:
[Map+/ Reduce Map*]
I know I can set four MR jobs for my task, and use default mappers for the last three jobs. But that will cost a lot of disk I/O, since reducers should write the result into disk to let the following mapper access it. Is there any other Hadoop built-in feature to chain my reducers to lower the I/O cost?

I am using Hadoop 1.0.4.
解决方案
I dont think that you can have the o/p of a reducer being given to another reducer directly. I would have gone for this:
Input-> Map1 -> Reduce1 -> Identity mapper -> Reducer2 -> Identity mapper -> Reduce3 -> Identity mapper -> Reduce4 -> Output
In Hadoop 2.X series, internally you can chain mappers before reducer with ChainMapper and chain Mappers after reducer with ChainReducer.

这篇关于在Hadoop MapReduce作业中链接多重缩减器的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Hadoop MapReduce作业中链接多重缩减器 [英] Chaining Multi-Reducers in a Hadoop MapReduce job

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

在Hadoop MapReduce作业中链接多重缩减器 [英] Chaining Multi-Reducers in a Hadoop MapReduce job

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭