Mapper Combiner patitioner shuffle / sort的顺序 [英] Order of Mapper Combiner patitioner shuffle/sort

查看:292
本文介绍了Mapper Combiner patitioner shuffle / sort的顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在第206页的定向指南:Hadoop中有以下文本。



在写入磁盘之前,线程首先将数据分割成对应
他们最终将被送到的减速器。在每个分区中,后台线程按键执行内存中排序,如果有组合函数
,它将在排序输出中运行。运行组合函数可以获得更多的
紧凑映射输出,所以有更少的数据写入本地磁盘并传输到
reducer。



<因此,通过这种理解,我可以按照Mapper,partitioner,shuffle / sort,Combiner的顺序排列顺序吗?

已经写了一篇关于此的好文章: http://0x0fff.com/hadoop-mapreduce-comprehensive- description /
一般而言,你是对的,但特别是有更多的角落案例 - 组合器可能会被忽略的一些记录,其中一些可能会运行很多次,甚至是所以组合器可能会在减速器前开始减速。所以你一般来说都是对的,但事情要复杂得多

I have the below text in Definite Guide: Hadoop in pg 206.

Before it writes to disk, the thread first divides the data into partitions corresponding to the reducers that they will ultimately be sent to. Within each partition, the background thread performs an in-memory sort by key, and if there is a combiner function, it is run on the output of the sort. Running the combiner function makes for a more compact map output, so there is less data to write to local disk and to transfer to the reducer.

So with this understanding, Can I sort the order as Mapper, partitioner, shuffle/sort, Combiner?

解决方案

I've written a good article about this: http://0x0fff.com/hadoop-mapreduce-comprehensive-description/ In general you are right, but in particular there are much more corner cases - combiner might be omitted for some of the records, for some of them it might run many times, and it is even so that combiner might be started on reduce side before the reducer. So you are right in general, but the things are much more complex

这篇关于Mapper Combiner patitioner shuffle / sort的顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆