基于mapreduce框架决定是否启动组合器 [英] On what basis mapreduce framework decides whether to launch a combiner or not

查看:144
本文介绍了基于mapreduce框架决定是否启动组合器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

按照定义组合器可能在映射器和缩减器之间的每个键上被调用0次,1次或多次。



我想知道,基于mapreduce框架决定cobiner将被启动多少次。

解决方案

简单地说,泄漏到磁盘的数量。排序发生在 MapOutputBuffer 填满之后,同时进行组合。



您可以调整使用参数 io.sort.mb io.sort.spill.percent io.sort.record.percent - 这些也在文档(书籍和在线资源)中解释。

特定组合器运行次数的示例:


0 - >无合并器被定义

1 - >一个组合器被定义并且MapOutputBuffer被填充一次


> 1 - > a组合器被定义并且MapOutputBuffer被多次填充

请注意,即使 MapOutputBuffer 永远不会完全填满,这个缓冲区必须在映射阶段结束时刷新,因此触发组合器至少运行一次(如果定义的话)。


As per definition "The Combiner may be called 0, 1, or many times on each key between the mapper and reducer."

I want to know that on what basis mapreduce framework decides how many times cobiner will be launched.

解决方案

Simply the number of spills to disk. Sorting happens after the MapOutputBuffer filled up, at the same time the combining will take place.

You can tune the number of spills to disk with the parameters io.sort.mb, io.sort.spill.percent, io.sort.record.percent - those are also explained in the documentation (books and online resources).

Example for specific numbers of combiner runs:

0 -> no combiner was defined

1 -> a combiner was defined and the MapOutputBuffer filled up once

>1 -> a combiner was defined and the MapOutputBuffer filled up more than once

Note that even if the MapOutputBuffer never fills up completely, this buffer must be flushed at the end of the map stage and thus triggers the combiner to run at least once (if defined).

这篇关于基于mapreduce框架决定是否启动组合器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆