随机播放阶段和组合器阶段有什么区别? [英] What's the difference between shuffle phase and combiner phase?

查看:52
本文介绍了随机播放阶段和组合器阶段有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对MapReduce框架非常困惑.我对此感到困惑,来自不同来源的阅读.顺便说一句,这是我对MapReduce作业的想法

i'm pretty confused about the MapReduce Framework. I'm getting confused reading from different sources about that. By the way, this is my idea of a MapReduce Job

1. Map()-->emit <key,value>  
2. Partitioner (OPTIONAL) --> divide
    intermediate output from mapper and assign them to different
    reducers
3. Shuffle phase used to make: <key,listofvalues>    
4. Combiner,    component used like a minireducer wich perform some
    operations on    datas and then pass those data to the reducer.
    Combiner is on local    not HDFS, saving space and time.    
5. Reducer, get the data from the    combiner, perform further
    operation(probably the same as the    combiner) then release the
    output.     
6.  We will have n outputs parts,    where n is the number
    of reducers

基本上是对的吗?我的意思是,我发现一些资料表明合并器是随机播放阶段,并且基本上是按键对每个记录进行分组...

It is basically right? I mean, i found some sources stating that combiner is the shuffle phase and it basically groupby each record by key...

推荐答案

Combiner与改组阶段完全不同.您所说的改组是错误的,这是造成混淆的根源.

Combiner is NOT at all similar to the shuffling phase. What you describe as shuffling is wrong, which is the root of your confusion.

改组只是从地图复制密钥以进行缩减,与密钥生成无关.这是Reducer的第一阶段,其他两个阶段则是排序然后进行还原.

Shuffling is just copying keys from map to reduce, it has nothing to do with key generation. It is the first phase of a Reducer, with the other two being sorting and then reducing.

合并就像在本地执行一个reducer.它基本上就像一个化简器一样(它还扩展了Reducer类),这意味着,像一个化简器一样,它对映射器为同一键发出的局部值进行分组.

Combining is like executing a reducer locally, for the output of each mapper. It basically acts like a reducer (it also extends the Reducer class), which means that, like a reducer, it groups the local values that the mapper has emitted for the same key.

分区是将映射输出键分配给特定的归约任务,但这不是可选的.用您自己的实现覆盖默认的HashPartitioner是可选的.

Partitioning is, indeed, assigning the map output keys to specific reduce tasks, but it is not optional. Overriding the default HashPartitioner with an implementation of your own is optional.

我试图将这个答案保持在最低限度,但是您可以从Azim的建议中找到汤姆·怀特(Tom White)撰写的《 Hadoop:权威指南》一书中的更多信息,以及

I tried to keep this answer minimal, but you can find more information on the book Hadoop: The Definitive Guide by Tom White, as Azim suggests, and some related things in this post.

这篇关于随机播放阶段和组合器阶段有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆