hadoop地图减少二次排序 [英] hadoop map reduce secondary sorting
问题描述
任何人都可以解释我如何在hadoop中进行二级排序吗?
为什么必须使用 GroupingComparator
以及hadoop如何工作?
我正在浏览下面的链接,并对groupcomapator的工作方式产生怀疑。
任何人都可以解释分组比较器是如何工作的吗?
http://www.bigdataspeak.com/2013/02/hadoop-how-to-do-secondary-sort-on_25.html
分组比较
数据到达简化器后,所有数据都按键组合。由于我们有一个组合键,我们需要确保记录完全按照自然键组合。这是通过编写自定义GroupPartitioner来完成的。我们有一个Comparator对象只考虑TemperaturePair类的yearMonth字段,用于将记录分组在一起。
public class YearMonthGroupingComparator延伸WritableComparator {
public YearMonthGroupingComparator(){
super(TemperaturePair.class,true);
$ b @Override
public int compare(WritableComparable tp1,WritableComparable tp2){
TemperaturePair temperaturePair =(TemperaturePair)tp1;
TemperaturePair temperaturePair2 =(TemperaturePair)tp2;
return temperaturePair.getYearMonth()。compareTo(temperaturePair2.getYearMonth());
$ b $ p
$ b 以下是运行我们的二级分类作业的结果:
new-host-2:sbin bbejeck $ hdfs dfs -cat secondary-sort / part-r-00000
190101 -206
190102 -333
190103 -272
190104 -61
190105 -33
190106 44
190107 72
190108 44
190109 17
190110 -33
190111 -217
<190112 -300
虽然按价值排序数据可能不是一个常见的需求,但它一个很好的工具,在你需要的时候在后面的口袋里。此外,我们可以通过使用自定义分区程序和组分区来深入了解Hadoop的内部工作。
请参阅此链接.. 在hadoop map reduce中使用分组比较器
Can any one explain me how secondary sorting works in hadoop ?
Why must one use GroupingComparator
and how does it work in hadoop ?
I was going through the link given below and got doubt on how groupcomapator works.
Can any one explain me how grouping comparator works?
http://www.bigdataspeak.com/2013/02/hadoop-how-to-do-secondary-sort-on_25.html
解决方案 Grouping Comparator
Once the data reaches a reducer, all data is grouped by key. Since we have a composite key, we need to make sure records are grouped solely by the natural key. This is accomplished by writing a custom GroupPartitioner. We have a Comparator object only considering the yearMonth field of the TemperaturePair class for the purposes of grouping the records together.
public class YearMonthGroupingComparator extends WritableComparator {
public YearMonthGroupingComparator() {
super(TemperaturePair.class, true);
}
@Override
public int compare(WritableComparable tp1, WritableComparable tp2) {
TemperaturePair temperaturePair = (TemperaturePair) tp1;
TemperaturePair temperaturePair2 = (TemperaturePair) tp2;
return temperaturePair.getYearMonth().compareTo(temperaturePair2.getYearMonth());
}
}
Here are the results of running our secondary sort job:
new-host-2:sbin bbejeck$ hdfs dfs -cat secondary-sort/part-r-00000
190101 -206
190102 -333
190103 -272
190104 -61
190105 -33
190106 44
190107 72
190108 44
190109 17
190110 -33
190111 -217
190112 -300
While sorting data by value may not be a common need, it’s a nice tool to have in your back pocket when needed. Also, we have been able to take a deeper look at the inner workings of Hadoop by working with custom partitioners and group partitioners.
Refer this link also..What is the use of grouping comparator in hadoop map reduce
这篇关于hadoop地图减少二次排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!