Hadoop中的Sort Comparator和Group Comparator之间有什么区别? [英] What are the differences between Sort Comparator and Group Comparator in Hadoop?
问题描述
Hadoop中的排序比较器和组比较器之间有什么区别? 为了理解 GroupComparator ,请参阅我对这个问题的回答 -
在hadoop map reduce中使用分组比较器SortComparator :用于定义映射输出键的排序方式
本书摘自Hadoop - 权威指南:
键的排序顺序如下:
-
如果设置属性
)mapred.output.key.comparator.class
,显式地或由
调用setSortComparatorClass()$ c $在Job上,然后使用该类的一个实例。 (在
中,旧API的等效方法是JobConf
。setOutputKeyComparatorClass()
否则,键必须是WritableComparable
的子类,并且键类的已注册
比较器是
如果没有已注册的比较器,则使用
RawComparator
来反序列化
字节流被比较为对象并委托给WritableComparable
的compareTo()
方法。
SortComparator Vs GroupComparator在一行内:
SortComparator
决定map输出键是如何排序的,而
What are the differences between Sort Comparator and Group Comparator in Hadoop?
To understand GroupComparator, see my answer to this question -
What is the use of grouping comparator in hadoop map reduce
SortComparator:Used to define how map output keys are sorted
Excerpts from the book Hadoop - Definitive Guide:
Sort order for keys is found as follows:
If the property
mapred.output.key.comparator.class
is set, either explicitly or by callingsetSortComparatorClass()
on Job, then an instance of that class is used. (In the old API the equivalent method issetOutputKeyComparatorClass()
onJobConf
.)Otherwise, keys must be a subclass of
WritableComparable
, and the registered comparator for the key class is used.If there is no registered comparator, then a
RawComparator
is used that deserializes the byte streams being compared into objects and delegates to theWritableComparable
’scompareTo()
method.
SortComparator Vs GroupComparator in a one liner:
SortComparator
decides how map output keys are sorted while GroupComparator
decides which map output keys within the Reducer go to the same reduce method call.
这篇关于Hadoop中的Sort Comparator和Group Comparator之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!