hadoop map reduce中分组比较器的用途是什么? [英] What is the use of grouping comparator in hadoop map reduce

查看:127
本文介绍了hadoop map reduce中分组比较器的用途是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



根据二级排序的权威性指导示例

我们希望键的排序顺序按年份(升序),然后按
温度(降序):

<$ p $ 1900 35°C
1900 34°C
1900 34°C
...
1901 36°C
1901 35° C

通过将分区程序设置为按键的年份进行分区,我们可以保证$ b同一年的$ b记录转到同一个缩减器。但是,这还不足以实现我们的
目标。分区器只能确保一个reducer接收每年
的所有记录;它不会改变这个事实,即减速器按分区内的键进行分组。


因为我们已经编写了我们自己的分区器,它可以处理映射输出键去特定的reducer,那么我们为什么要对它进行分组。



预先感谢

解决方案

div>

为了支持我选择的答案,我补充道: 这个解释

  **输入**:

符号时间价格
a 1 10
a 2 20
b 3 30

**地图输出**:创建复合关键值如下所示:

>符号时间时间价格
>
> ** a-1 **& nbsp;& nbsp;& nbsp;& nbsp;& nbsp;& nbsp;& nbsp; & nbsp;& nbsp; 1 -10
>
> ** a-2 **& nbsp;& nbsp;& nbsp;& nbsp;& nbsp;& nbsp;& nbsp; 2 -20
>
> ** b-3 **& nbsp;& nbsp;& nbsp;& nbsp;& nbsp;& nbsp;& nbsp; 3 -30

分区程序:将路由a-1和a-尽管钥匙是不同的,但同一个减速器的2个钥匙。它还会将b-3路由到一个单独的缩减器。

GroupComparator :一旦复合键key_value到达缩减器,而不是减速器得到

 >(** a-1 **,{1-10})
>
>(** a-2 **,{2-20})

上述情况将由于组合后的唯一键值而发生。



组比较器将确保reducer获取:

 (a-1,{** 1-10,2-20 **})
code>

组合值中的关键字将成为组中第一个值。这可以通过键比较器来控制。

  ** [[在单个reduce方法调用中]] ** 


I would like to know why grouping comparator is used in secondary sort of mapreduce.

According to the definitive guide example of secondary sorting

We want the sort order for keys to be by year (ascending) and then by temperature (descending):

1900 35°C
1900 34°C
1900 34°C
...
1901 36°C
1901 35°C

By setting a partitioner to partition by the year part of the key, we can guarantee that records for the same year go to the same reducer. This still isn’t enough to achieve our goal, however. A partitioner ensures only that one reducer receives all the records for a year; it doesn’t change the fact that the reducer groups by key within the partition.

Since we would have already written our own partitioner which would take care of the map output keys going to particular reducer,so why should we group it.

Thanks in advance

解决方案

In support of the chosen answer I add:

Following on from this explanation

**Input**:

    symbol time price
    a      1    10
    a      2    20
    b      3    30

**Map output**: create composite key\values like so:

> symbol-time time-price
>
>**a-1**&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1-10
>
>**a-2**&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2-20
>
>**b-3**&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3-30

The Partitioner: will route the a-1 and a-2 keys to the same reducer despite the keys being different. It will also route the b-3 to a separate reducer.

GroupComparator: once the composites key\value arrive at the reducer instead of the reducer getting

>(**a-1**,{1-10})
>
>(**a-2**,{2-20})

the above will happen due to the unique key values following composition.

the group comparator will ensure the reducer gets:

(a-1,{**1-10,2-20**})

The key of the grouped values will be the one which comes first in the group. This can be controlled by Key comparator.

**[[In a single reduce method call.]]**

这篇关于hadoop map reduce中分组比较器的用途是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆