mapreduce框架基于什么决定是否启动combiner [英] On what basis mapreduce framework decides whether to launch a combiner or not

查看:23
本文介绍了mapreduce框架基于什么决定是否启动combiner的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据定义可以在映射器和化简器之间的每个键上调用组合器 0、1 或多次."

我想知道mapreduce框架是根据什么来决定cobiner的启动次数的.

解决方案

只是溢出到磁盘的数量.MapOutputBuffer 填满后进行排序,同时进行合并.

您可以使用参数io.sort.mbio.sort.spill.percentio.sort 调整溢出到磁盘的数量.record.percent - 这些也在文档(书籍和在线资源)中进行了解释.

特定组合器运行次数的示例:

<块引用>

0 -> 未定义组合器

1 -> 定义了一个组合器并且 MapOutputBuffer 被填满一次

>1 -> 定义了一个组合器并且 MapOutputBuffer 填充了不止一次

请注意,即使 MapOutputBuffer 永远不会完全填满,该缓冲区也必须在映射阶段结束时刷新,从而触发组合器至少运行一次(如果已定义).

As per definition "The Combiner may be called 0, 1, or many times on each key between the mapper and reducer."

I want to know that on what basis mapreduce framework decides how many times cobiner will be launched.

解决方案

Simply the number of spills to disk. Sorting happens after the MapOutputBuffer filled up, at the same time the combining will take place.

You can tune the number of spills to disk with the parameters io.sort.mb, io.sort.spill.percent, io.sort.record.percent - those are also explained in the documentation (books and online resources).

Example for specific numbers of combiner runs:

0 -> no combiner was defined

1 -> a combiner was defined and the MapOutputBuffer filled up once

>1 -> a combiner was defined and the MapOutputBuffer filled up more than once

Note that even if the MapOutputBuffer never fills up completely, this buffer must be flushed at the end of the map stage and thus triggers the combiner to run at least once (if defined).

这篇关于mapreduce框架基于什么决定是否启动combiner的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆