MR工作各个阶段的顺序正确吗? [英] Correct order of various phases of MR job?
问题描述
我试图了解MR Job经历的各个阶段.我阅读了相同的在线文档.
I am trying to understand the various phases which a MR Job goes through. I read online documentation for the same.
基于此,我对序列的理解如下:
Based on this, my understand on the sequence is as below:
map()->分区程序->排序(在mapper机器上)->随机播放->排序(在reducer机器上)-> groupBy(Key)(在reducer机器上)-> reduce()
map() -> Partitioner -> Sorting (at mapper machine) -> Shuffle -> Sorting (at reducer machine) -> groupBy(Key) (at reducer machine) -> reduce()
这是执行MR作业的正确顺序吗?
Is this the correct sequence in which a MR Job executes?
推荐答案
地图的各个阶段都会减少工作量:
Various phases of a map reduce job:
地图阶段:
-
从HDFS中读取分配的输入
Reads assigned input split from HDFS
将输入作为键值对解析到记录中
Parses input into records as key-value pairs
将映射功能应用于每个记录
Applies map function to each record
通知主节点其完成情况
分区阶段
-
每个映射器必须确定哪个减速器将接收每个输出
Each mapper must determine which reducer will receive each of the outputs
对于任何键,目标分区都是相同的
For any key, destination partition is the same
否.分区数=减速器数量
No. of partitions = No. of reducers
随机播放阶段
- 从所有地图任务中获取与reduce任务的存储桶对应的部分的输入数据
排序阶段
- 合并将所有地图输出分类为一次运行
减少阶段
-
将用户定义的reduce函数应用于合并的un
Apply user defined reduce function to merged un
参数是键和相应的值列表
Argument are the key and corresponding list of values
将输出写入HDFS中的文件
Writes output to a file in HDFS
这篇关于MR工作各个阶段的顺序正确吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!