mapreduce框架基于什么决定是否启动combiner [英] On what basis mapreduce framework decides whether to launch a combiner or not
问题描述
根据定义可以在映射器和化简器之间的每个键上调用组合器 0、1 或多次."
我想知道mapreduce框架是根据什么来决定cobiner的启动次数的.
只是溢出到磁盘的数量.MapOutputBuffer
填满后进行排序,同时进行合并.
您可以使用参数io.sort.mb
、io.sort.spill.percent
、io.sort 调整溢出到磁盘的数量.record.percent
- 这些也在文档(书籍和在线资源)中进行了解释.
特定组合器运行次数的示例:
<块引用>0 -> 未定义组合器
1 -> 定义了一个组合器并且 MapOutputBuffer 被填满一次
>1 -> 定义了一个组合器并且 MapOutputBuffer 填充了不止一次
请注意,即使 MapOutputBuffer
永远不会完全填满,该缓冲区也必须在映射阶段结束时刷新,从而触发组合器至少运行一次(如果已定义).>
As per definition "The Combiner may be called 0, 1, or many times on each key between the mapper and reducer."
I want to know that on what basis mapreduce framework decides how many times cobiner will be launched.
Simply the number of spills to disk. Sorting happens after the MapOutputBuffer
filled up, at the same time the combining will take place.
You can tune the number of spills to disk with the parameters io.sort.mb
, io.sort.spill.percent
, io.sort.record.percent
- those are also explained in the documentation (books and online resources).
Example for specific numbers of combiner runs:
0 -> no combiner was defined
1 -> a combiner was defined and the MapOutputBuffer filled up once
>1 -> a combiner was defined and the MapOutputBuffer filled up more than once
Note that even if the MapOutputBuffer
never fills up completely, this buffer must be flushed at the end of the map stage and thus triggers the combiner to run at least once (if defined).
这篇关于mapreduce框架基于什么决定是否启动combiner的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!