hadoop地图减少二次排序 [英] hadoop map reduce secondary sorting

查看：90 发布时间：2018/5/31 18:23:37 hadoop mapreduce bigdata hadoop-partitioning

本文介绍了hadoop地图减少二次排序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

任何人都可以解释我如何在hadoop中进行二级排序吗？

为什么必须使用 GroupingComparator 以及hadoop如何工作？

我正在浏览下面的链接，并对groupcomapator的工作方式产生怀疑。

任何人都可以解释分组比较器是如何工作的吗？

http://www.bigdataspeak.com/2013/02/hadoop-how-to-do-secondary-sort-on_25.html

解决方案

分组比较

数据到达简化器后，所有数据都按键组合。由于我们有一个组合键，我们需要确保记录完全按照自然键组合。这是通过编写自定义GroupPartitioner来完成的。我们有一个Comparator对象只考虑TemperaturePair类的yearMonth字段，用于将记录分组在一起。

  public class YearMonthGroupingComparator延伸WritableComparator {
 
 public YearMonthGroupingComparator（）{
 super（TemperaturePair.class，true）; 
 
 $ b @Override 
 public int compare（WritableComparable tp1，WritableComparable tp2）{
 TemperaturePair temperaturePair =（TemperaturePair）tp1; 
 TemperaturePair temperaturePair2 =（TemperaturePair）tp2; 
 return temperaturePair.getYearMonth（）。compareTo（temperaturePair2.getYearMonth（））; 
 
 
 
 
 $ b $ p 
 $ b 以下是运行我们的二级分类作业的结果： 
  new-host-2：sbin bbejeck $ hdfs dfs -cat secondary-sort / part-r-00000 
  
 
 
 190101 -206 
 
 
 190102 -333 
 
 
 190103 -272 
 
 190104 -61 
 
 190105 -33  
 
 190106 44  
 
 190107 72  
 
 190108 44 
 
 190109 17 
 
 190110 -33 
 
  190111 -217  
 
 <190112 -300 
 
 
 虽然按价值排序数据可能不是一个常见的需求，但它一个很好的工具，在你需要的时候在后面的口袋里。此外，我们可以通过使用自定义分区程序和组分区来深入了解Hadoop的内部工作。 
请参阅此链接..  在hadoop map reduce中使用分组比较器   
Can any one explain me how secondary sorting works in hadoop ?

Why must one use GroupingComparator and how does it work in hadoop ?

I was going through the link given below and got doubt on how groupcomapator works.

Can any one explain me how grouping comparator works?

http://www.bigdataspeak.com/2013/02/hadoop-how-to-do-secondary-sort-on_25.html
 解决方案 
Grouping Comparator

Once the data reaches a reducer, all data is grouped by key. Since we have a composite key, we need to make sure records are grouped solely by the natural key. This is accomplished by writing a custom GroupPartitioner. We have a Comparator object only considering the yearMonth field of the TemperaturePair class for the purposes of grouping the records together.
public class YearMonthGroupingComparator extends WritableComparator {

    public YearMonthGroupingComparator() {
        super(TemperaturePair.class, true);
    }

    @Override
    public int compare(WritableComparable tp1, WritableComparable tp2) {
        TemperaturePair temperaturePair = (TemperaturePair) tp1;
        TemperaturePair temperaturePair2 = (TemperaturePair) tp2;
        return temperaturePair.getYearMonth().compareTo(temperaturePair2.getYearMonth());
    }
}
Here are the results of running our secondary sort job:
new-host-2:sbin bbejeck$ hdfs dfs -cat secondary-sort/part-r-00000
190101  -206

190102  -333

190103  -272

190104  -61

190105  -33

190106  44

190107  72

190108  44

190109  17

190110  -33

190111  -217

190112  -300

While sorting data by value may not be a common need, it’s a nice tool to have in your back pocket when needed. Also, we have been able to take a deeper look at the inner workings of Hadoop by working with custom partitioners and group partitioners. 
Refer this link also..What is the use of grouping comparator in hadoop map reduce

                        这篇关于hadoop地图减少二次排序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

hadoop地图减少二次排序 [英] hadoop map reduce secondary sorting

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

hadoop地图减少二次排序 [英] hadoop map reduce secondary sorting

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭