随机播放并排序为mapreduce [英] Shuffle and sort for mapreduce

查看:102
本文介绍了随机播放并排序为mapreduce的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我阅读了权威指南以及网络上的其他链接,其中包括这里

我的问题是:


确切地说,洗牌和排序发生在哪里?

根据我的理解,它们都发生在mappers和reducer上。但是一些链接提到,映射器上发生的混洗以及对减速器的排序。

有人可以证实我的理解是否正确;如果没有,他们可以提供额外的文件,我可以通过?



MapReduce保证每个reducer的输入都是按键排序的。系统执行排序的过程和将映射输出传输到reducers作为输入称为shuffle。



<排序发生在MapReduce程序的各个阶段,因此可以在Map和Reduce阶段存在。

$ b


$ b

请看这张图



在Map和Reduce阶段为以上图片添加更多描述。



地图面:



当map函数开始产生输出时,它不会简单写入磁盘。在将输出写入映射到磁盘之前,线程第一个将数据划分为最终要发送到的与reducer 对应的分区。在每个分区中,后台线程通过键执行内存中排序。



Reduce Side :

所有映射输出都已被复制时,reduce任务将进入排序阶段(应适当地称为合并阶段,因为排序在地图一侧进行),它合并地图输出,保持排序顺序。这将一轮完成。



资料来源:Hadoop权威指南。


I read through the definitive guide and some other links on the web including the one here

My question is

where exactly does shuffling and sorting happen?

As per my understanding, they happen on both mappers and reducers. But some links mention that shuffling happens on mappers and sorting on reducers.

Can someone confirm if my understanding is correct; if not can they provide additional documentation I can go through?

解决方案

Shuffle:

MapReduce makes the guarantee that the input to every reducer is sorted by key. The process by which the system performs the sort and transfers map outputs to the reducers as inputs is known as the shuffle.

Sort:

Sorting happens in various stages of MapReduce program, So can exists in Map and Reduce phases.

Please have a look at this diagram

Adding more description to above image in Map and Reduce phases.

The Map Side:

When the map function starts producing output, it is not simply written to disk. Before Map output writes to disk, the thread first divides the data into partitions corresponding to the reducers that they will ultimately be sent to. Within each partition, the background thread performs an in-memory sort by key.

The Reduce Side:

When all the map outputs have been copied, the reduce task moves into the sort phase (which should properly be called the merge phase, as the sorting was carried out on the map side), which merges the map outputs, maintaining their sort ordering. This will be done in rounds.

Source : Hadoop Definitive Guide.

这篇关于随机播放并排序为mapreduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆