Hadoop 中的 Sort Comparator 和 Group Comparator 有什么区别? [英] What are the differences between Sort Comparator and Group Comparator in Hadoop?
问题描述
Hadoop 中Sort Comparator 和Group Comparator 有什么区别?
What are the differences between Sort Comparator and Group Comparator in Hadoop?
推荐答案
要了解GroupComparator,看我对这个问题的回答-
To understand GroupComparator, see my answer to this question -
SortComparator:用于定义地图输出键的排序方式
SortComparator:Used to define how map output keys are sorted
摘自《Hadoop - 权威指南》一书:
Excerpts from the book Hadoop - Definitive Guide:
键的排序顺序如下:
如果属性
mapred.output.key.comparator.class
已设置,无论是显式还是通过在 Job 上调用setSortComparatorClass()
,然后使用该类的实例.(在旧 API 的等效方法是JobConf
上的setOutputKeyComparatorClass()
.)
If the property
mapred.output.key.comparator.class
is set, either explicitly or by callingsetSortComparatorClass()
on Job, then an instance of that class is used. (In the old API the equivalent method issetOutputKeyComparatorClass()
onJobConf
.)
否则,keys 必须是 WritableComparable
的子类,并且注册的使用了关键类的比较器.
Otherwise, keys must be a subclass of WritableComparable
, and the registered
comparator for the key class is used.
如果没有注册的比较器,则使用 RawComparator
反序列化字节流被比较为对象并委托给 WritableComparable
的 compareTo()
方法.
If there is no registered comparator, then a RawComparator
is used that deserializes
the byte streams being compared into objects and delegates to the WritableComparable
’s compareTo()
method.
SortComparator 与 GroupComparator 于一身:SortComparator
决定 map 输出键的排序方式,而 GroupComparator
决定 Reducer 中的哪些 map 输出键进入同一个 reduce 方法调用.
SortComparator Vs GroupComparator in a one liner:
SortComparator
decides how map output keys are sorted while GroupComparator
decides which map output keys within the Reducer go to the same reduce method call.
这篇关于Hadoop 中的 Sort Comparator 和 Group Comparator 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!