mapreduce 框架在什么基础上决定是否启动组合器 [英] On what basis mapreduce framework decides whether to launch a combiner or not

查看:11
本文介绍了mapreduce 框架在什么基础上决定是否启动组合器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据定义,在 mapper 和 reducer 之间的每个键上,Combiner 可能被调用 0、1 或多次."

我想知道mapreduce框架在什么基础上决定了cobiner的启动次数.

As per definition "The Combiner may be called 0, 1, or many times on each key between the mapper and reducer."

I want to know that on what basis mapreduce framework decides how many times cobiner will be launched.

推荐答案

只是溢出到磁盘的次数.MapOutputBuffer 填满后进行排序,同时进行合并.

Simply the number of spills to disk. Sorting happens after the MapOutputBuffer filled up, at the same time the combining will take place.

您可以使用参数io.sort.mbio.sort.spill.percentio.sort 调整溢出到磁盘的次数.record.percent - 这些也在文档(书籍和在线资源)中进行了解释.

You can tune the number of spills to disk with the parameters io.sort.mb, io.sort.spill.percent, io.sort.record.percent - those are also explained in the documentation (books and online resources).

特定数量的组合器运行示例:

Example for specific numbers of combiner runs:

0 -> 没有定义组合器

0 -> no combiner was defined

1 -> 一个组合器被定义并且 MapOutputBuffer 被填满一次

1 -> a combiner was defined and the MapOutputBuffer filled up once

>1 -> 定义了一个组合器,并且 MapOutputBuffer 被多次填满

>1 -> a combiner was defined and the MapOutputBuffer filled up more than once

请注意,即使 MapOutputBuffer 从未完全填满,此缓冲区也必须在映射阶段结束时刷新,从而触发组合器至少运行一次(如果已定义).

Note that even if the MapOutputBuffer never fills up completely, this buffer must be flushed at the end of the map stage and thus triggers the combiner to run at least once (if defined).

这篇关于mapreduce 框架在什么基础上决定是否启动组合器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆