Hadoop 中的 Sort Comparator 和 Group Comparator 有什么区别? [英] What are the differences between Sort Comparator and Group Comparator in Hadoop?

查看:28
本文介绍了Hadoop 中的 Sort Comparator 和 Group Comparator 有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hadoop 中Sort ComparatorGroup Comparator 有什么区别?

What are the differences between Sort Comparator and Group Comparator in Hadoop?

推荐答案

要了解GroupComparator,看我对这个问题的回答-

To understand GroupComparator, see my answer to this question -

分组比较器有什么用在 hadoop map reduce

SortComparator:用于定义地图输出键的排序方式

SortComparator:Used to define how map output keys are sorted

摘自《Hadoop - 权威指南》一书:

Excerpts from the book Hadoop - Definitive Guide:

键的排序顺序如下:

  1. 如果属性 mapred.output.key.comparator.class 已设置,无论是显式还是通过在 Job 上调用 setSortComparatorClass(),然后使用该类的实例.(在旧 API 的等效方法是 JobConf 上的 setOutputKeyComparatorClass().)

  1. If the property mapred.output.key.comparator.class is set, either explicitly or by calling setSortComparatorClass() on Job, then an instance of that class is used. (In the old API the equivalent method is setOutputKeyComparatorClass() on JobConf.)

否则,keys 必须是 WritableComparable 的子类,并且注册的使用了关键类的比较器.

Otherwise, keys must be a subclass of WritableComparable, and the registered comparator for the key class is used.

如果没有注册的比较器,则使用 RawComparator 反序列化字节流被比较为对象并委托给 WritableComparablecompareTo() 方法.

If there is no registered comparator, then a RawComparator is used that deserializes the byte streams being compared into objects and delegates to the WritableComparable’s compareTo() method.

SortComparator 与 GroupComparator 于一身:SortComparator 决定 map 输出键的排序方式,而 GroupComparator 决定 Reducer 中的哪些 map 输出键进入同一个 reduce 方法调用.

SortComparator Vs GroupComparator in a one liner: SortComparator decides how map output keys are sorted while GroupComparator decides which map output keys within the Reducer go to the same reduce method call.

这篇关于Hadoop 中的 Sort Comparator 和 Group Comparator 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆