hadoop地图减少二次排序 [英] hadoop map reduce secondary sorting

查看:90
本文介绍了hadoop地图减少二次排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都可以解释我如何在hadoop中进行二级排序吗?

为什么必须使用 GroupingComparator 以及hadoop如何工作?



我正在浏览下面的链接,并对groupcomapator的工作方式产生怀疑。

任何人都可以解释分组比较器是如何工作的吗?



http://www.bigdataspeak.com/2013/02/hadoop-how-to-do-secondary-sort-on_25.html

解决方案

分组比较



数据到达简化器后,所有数据都按键组合。由于我们有一个组合键,我们需要确保记录完全按照自然键组合。这是通过编写自定义GroupPartitioner来完成的。我们有一个Comparator对象只考虑TemperaturePair类的yearMonth字段,用于将记录分组在一起。

  public class YearMonthGroupingComparator延伸WritableComparator {

public YearMonthGroupingComparator(){
super(TemperaturePair.class,true);

$ b @Override
public int compare(WritableComparable tp1,WritableComparable tp2){
TemperaturePair temperaturePair =(TemperaturePair)tp1;
TemperaturePair temperaturePair2 =(TemperaturePair)tp2;
return temperaturePair.getYearMonth()。compareTo(temperaturePair2.getYearMonth());




$ b $ p
$ b

以下是运行我们的二级分类作业的结果:

  new-host-2:sbin bbejeck $ hdfs dfs -cat secondary-sort / part-r-00000 



190101 -206



190102 -333



190103 -272

190104 -61

190105 -33

190106 44


190107 72

190108 44

190109 17

190110 -33

190111 -217

<190112 -300



虽然按价值排序数据可能不是一个常见的需求,但它一个很好的工具,在你需要的时候在后面的口袋里。此外,我们可以通过使用自定义分区程序和组分区来深入了解Hadoop的内部工作。
请参阅此链接.. 在hadoop map reduce中使用分组比较器

Can any one explain me how secondary sorting works in hadoop ?
Why must one use GroupingComparator and how does it work in hadoop ?

I was going through the link given below and got doubt on how groupcomapator works.
Can any one explain me how grouping comparator works?

http://www.bigdataspeak.com/2013/02/hadoop-how-to-do-secondary-sort-on_25.html

解决方案

Grouping Comparator

Once the data reaches a reducer, all data is grouped by key. Since we have a composite key, we need to make sure records are grouped solely by the natural key. This is accomplished by writing a custom GroupPartitioner. We have a Comparator object only considering the yearMonth field of the TemperaturePair class for the purposes of grouping the records together.

public class YearMonthGroupingComparator extends WritableComparator {

    public YearMonthGroupingComparator() {
        super(TemperaturePair.class, true);
    }

    @Override
    public int compare(WritableComparable tp1, WritableComparable tp2) {
        TemperaturePair temperaturePair = (TemperaturePair) tp1;
        TemperaturePair temperaturePair2 = (TemperaturePair) tp2;
        return temperaturePair.getYearMonth().compareTo(temperaturePair2.getYearMonth());
    }
}

Here are the results of running our secondary sort job:

new-host-2:sbin bbejeck$ hdfs dfs -cat secondary-sort/part-r-00000

190101 -206

190102 -333

190103 -272

190104 -61

190105 -33

190106 44

190107 72

190108 44

190109 17

190110 -33

190111 -217

190112 -300

While sorting data by value may not be a common need, it’s a nice tool to have in your back pocket when needed. Also, we have been able to take a deeper look at the inner workings of Hadoop by working with custom partitioners and group partitioners. Refer this link also..What is the use of grouping comparator in hadoop map reduce

这篇关于hadoop地图减少二次排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆