为 mapreduce 打乱和排序 [英] Shuffle and sort for mapreduce

查看:16
本文介绍了为 mapreduce 打乱和排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我阅读了权威指南和网络上的其他一些链接,包括 这里

I read through the definitive guide and some other links on the web including the one here

我的问题是

洗牌和排序到底发生在哪里?

where exactly does shuffling and sorting happen?

据我了解,它们发生在 mapper 和 reducer 上.但是一些链接提到改组发生在映射器上,排序发生在减速器上.

As per my understanding, they happen on both mappers and reducers. But some links mention that shuffling happens on mappers and sorting on reducers.

谁能确认我的理解是否正确;如果不能,他们可以提供我可以查看的其他文件吗?

Can someone confirm if my understanding is correct; if not can they provide additional documentation I can go through?

推荐答案

随机播放:

MapReduce 保证每个 reducer 的输入都是按键排序的.系统执行排序并将map输出作为输入传输到reducer的过程称为shuffle.

MapReduce makes the guarantee that the input to every reducer is sorted by key. The process by which the system performs the sort and transfers map outputs to the reducers as inputs is known as the shuffle.

排序:

排序发生在MapReduce程序的各个阶段,所以可以存在于Map和Reduce阶段.

Sorting happens in various stages of MapReduce program, So can exists in Map and Reduce phases.

请看这张图

在 Map 和 Reduce 阶段为上图添加更多描述.

Adding more description to above image in Map and Reduce phases.

地图方面:

当 map 函数开始产生输出时,它并不是简单地写入磁盘.在 Map 输出写入磁盘之前,线程首先将数据划分为对应于 reducer 的分区,它们最终将被发送到.在每个分区内,后台线程按key执行内存排序.

When the map function starts producing output, it is not simply written to disk. Before Map output writes to disk, the thread first divides the data into partitions corresponding to the reducers that they will ultimately be sent to. Within each partition, the background thread performs an in-memory sort by key.

归约面:

当所有 map 输出都被复制后,reduce 任务进入排序阶段(应该正确地称为合并阶段,因为排序是在 map 端进行的),它合并 map 输出,保持它们的排序排序.这将分轮进行.

When all the map outputs have been copied, the reduce task moves into the sort phase (which should properly be called the merge phase, as the sorting was carried out on the map side), which merges the map outputs, maintaining their sort ordering. This will be done in rounds.

来源:Hadoop 权威指南.

Source : Hadoop Definitive Guide.

这篇关于为 mapreduce 打乱和排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆