hadoop map reduce中分组比较器有什么用 [英] What is the use of grouping comparator in hadoop map reduce

查看:27
本文介绍了hadoop map reduce中分组比较器有什么用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道为什么在二级 mapreduce 中使用分组比较器.

I would like to know why grouping comparator is used in secondary sort of mapreduce.

根据二次排序的权威指导示例

According to the definitive guide example of secondary sorting

我们希望键的排序顺序是按年份(升序)然后按温度(降序):

We want the sort order for keys to be by year (ascending) and then by temperature (descending):

1900 35°C
1900 34°C
1900 34°C
...
1901 36°C
1901 35°C

通过设置一个partitioner来按key的年份部分进行分区,我们可以保证同年的记录转到同一个减速器.这还不足以实现我们的目标目标,然而.分区器确保只有一个减速器接收所有记录一年;它不会改变reducer在分区内按key分组的事实.

By setting a partitioner to partition by the year part of the key, we can guarantee that records for the same year go to the same reducer. This still isn’t enough to achieve our goal, however. A partitioner ensures only that one reducer receives all the records for a year; it doesn’t change the fact that the reducer groups by key within the partition.

既然我们已经编写了自己的分区器,它会处理到特定减速器的映射输出键,那么我们为什么要对它进行分组.

Since we would have already written our own partitioner which would take care of the map output keys going to particular reducer,so why should we group it.

提前致谢

推荐答案

为了支持我选择的答案,我补充:

In support of the chosen answer I add:

这个解释

**Input**:

    symbol time price
    a      1    10
    a      2    20
    b      3    30

**Map output**: create composite keyvalues like so:

> symbol-time time-price
>
>**a-1**         1-10
>
>**a-2**         2-20
>
>**b-3**         3-30

分区器:尽管键不同,但会将 a-1 和 a-2 键路由到同一个减速器.它还会将 b-3 路由到单独的减速器.

The Partitioner: will route the a-1 and a-2 keys to the same reducer despite the keys being different. It will also route the b-3 to a separate reducer.

GroupComparator:一旦组合键值到达减速器而不是减速器

GroupComparator: once the composites keyvalue arrive at the reducer instead of the reducer getting

>(**a-1**,{1-10})
>
>(**a-2**,{2-20})

由于组合后的唯一键值会发生上述情况.

组比较器将确保减速器获得:

the group comparator will ensure the reducer gets:

(a-1,{**1-10,2-20**})

分组值的键将是组中第一个.这可以由 Key 比较器控制.

The key of the grouped values will be the one which comes first in the group. This can be controlled by Key comparator.

**[[In a single reduce method call.]]**

这篇关于hadoop map reduce中分组比较器有什么用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆