洗牌阶段实际上是做什么的? [英] What does the shuffling phase actually do?

查看:190
本文介绍了洗牌阶段实际上是做什么的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

洗牌阶段实际上是做什么的?

What does the shuffling phase actually do?

由于改组是将映射器o/p引入化简器o/p的过程,因此它只是基于分区器中编写的代码将映射器的特定键引入特定的化简器中

As shuffling is the process of bringing the mapper o/p to the reducer o/p, it just brings the specific keys from the mappers to the particular reducers based on the code written in partitioner

例如映射器1的o/p为{a,1} {b,1}

eg. the o/p of mapper 1 is {a,1} {b,1}

映射器2的o/p为{a,1} {b,1}

the o/p of mapper 2 is {a,1} {b,1}

在我的分区程序中,我写了所有以'a'开头的键都将进入化简器1,而所有以'b'开头的键都将去化简器2,因此o/p为:

and in my partitioner, I have written that all keys starting with 'a' will go to reducer 1 and all keys starting with 'b will go to reducer 2 so the o/p would be:

减速器1:{a,1} {a,1}

reducer 1: {a,1}{a,1}

减速器2:{b,1} {b,1}

reducer 2: {b,1}{b,1}


可能性-B

或者与上述过程一起,它还会对键进行分组吗?


Possibility - B

Or along with he above process, does it also groups the keys:

因此,o/p为:

减速器1:{a,[1,1]}

reducer 1: {a,[1,1]}

减速器2:{b,[1,1]}

reducer 2: {b,[1,1]}


我认为应该是A,因为键的分组必须在排序后进行,因为排序仅是为了使reducer可以轻松地指出一个键结束而另一个键正在启动.如果是,请何时真正进行密钥分组.


In my opinion I think it should be A because grouping of keys must take place after sorting because sorting is only done so that reducer can easily point out when one key is ending and the other key is starting. If yes, when does grouping of keys actually happen, please elaborate.

推荐答案

映射器和化简器不是单独的机器,而是单独的代码.映射代码和归约代码都在集群中存在的同一台机器上运行.

Mappers and Reducers are not separate machines but just separate code. Both, the mapping code as well as the reducing code runs on the same set machines present in the cluster.

因此,在集群中的所有计算机都运行了映射器之后,结果是:

So, after all machines in the cluster have run mapper, the results are:

  1. 在节点上本地绑定(考虑为本地分组");并且,
  2. 在群集中的所有节点上随机/重新分布.

将步骤2视为全局分组",因为它是通过将属于一个键的所有值都分配到其分配的唯一节点的方式来完成的.

Consider the step-2 a "global-grouping" because it is done in a manner that all values belonging to one key, go to their assigned unique node.

现在,节点在内存中的(键,值)对上运行Reducer代码.

Now, the nodes run the Reducer code on the (key, value) pairs residing on their memory.

这篇关于洗牌阶段实际上是做什么的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆