mapreduce,对值进行排序 [英] mapreduce, sort values
问题描述
我的映射器有一个输出:
I have an ouput from my mapper:
Mapper: KEY, VALUE(Timestamp, someOtherAttrbibutes)
我的减速机确实收到:
Reducer: KEY, Iterable<VALUE(Timestamp, someOtherAttrbibutes)>
我希望Iterable<VALUE(Timestamp, someOtherAttrbibutes)>
通过 Timestamp 属性进行排序.有可能实施吗?
I want Iterable<VALUE(Timestamp, someOtherAttrbibutes)>
to ordered by Timestamp attribute. Is there any possibility to implement it?
我想避免在Reducer代码中进行手动排序. http://cornercases.wordpress.com/2011/08/18/hadoop-object-reuse-pitfall-all-my-reducer-values-are-the-same/
I would like to avoid manual sorting inside Reducer code. http://cornercases.wordpress.com/2011/08/18/hadoop-object-reuse-pitfall-all-my-reducer-values-are-the-same/
我将不得不从Iterable中深度复制"所有对象,这会导致巨大的内存开销. :(((
I'll have to "deep-copy" all objects from Iterable and it can cause huge memory overhead. :(((
推荐答案
这相对容易,您需要为VALUE
类编写比较器类.
It's relatively easy, you need to write comparator class for your VALUE
class.
Take a closer look here: http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/ especially at A solution for secondary sorting part.
这篇关于mapreduce,对值进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!