仅地图任务中会有随机播放和排序吗? [英] Will there be Shuffle and sort in Map only task?

查看:11
本文介绍了仅地图任务中会有随机播放和排序吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

shuffle 和 sort 阶段是在 map 任务结束之前还是在 map 任务生成输出之后出现,这样就不会再回顾 map 任务了.这是我感到困惑的仅地图任务"案例.如果仅 Map 任务中没有 Shuffle 和 sort,有人可以解释一下数据是如何写入最终输出文件的.

Does the shuffle and sort phase come before the end of the map task or does it come after the output is generated from the map task so that there is no look back to the map task anymore. This is a 'Map only task' case where I get confusion. If there is no Shuffle and sort in Map only task, can someone explain how is the data written into the final output files.

推荐答案

当你有一个map-only任务时,根本没有shuffle,这意味着mapper会将最终输出直接写入HDFS.

When you have a map-only task, there is not shuffling at all, which means that mappers will write the final output directly to the HDFS.

另一方面,如果你有一个完整的 Map-Reduce 程序,包括映射器和化简器,是的,改组可以在化简阶段开始之前开始.

On the other hand, when you have a whole Map-Reduce program, with mappers and reducers, yes, shuffling can start before reduce-phase start.

引用 这个非常好的答案 在 SO:

首先,shuffle 是从映射器到减速器,所以我认为很明显这是必要的对于减速器,否则,他们将无法拥有任何输入(或来自每个映射器的输入).洗牌甚至可以在之前开始地图阶段已经完成,以节省一些时间.这就是为什么你可以看到一个reduce状态大于0%(但小于33%)的map时状态尚未 100%.

First of all shuffling is the process of transfering data from the mappers to the reducers, so I think it is obvious that it is necessary for the reducers, since otherwise, they wouldn't be able to have any input (or input from every mapper). Shuffling can start even before the map phase has finished, to save some time. That's why you can see a reduce status greater than 0% (but less than 33%) when the map status is not yet 100%.

希望这个答案能澄清您的困惑.

Hope this answer had clarified your confusion.

这篇关于仅地图任务中会有随机播放和排序吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆