如何在地图中排序数据减少hadoop? [英] How to sort data in map reduce hadoop?
问题描述
id值
1 20
2 3
3 9
4 36
我有大约1,000,000个ID,在第二步中,我必须对这些值进行排序。
4 36
1 20
3 9
2 3
如何在map缩减中对数据进行排序?我需要使用terasort吗?如果是的话,我如何在我的程序的第二步中使用terasort?
Thanks。
如果您想根据值进行排序
,在地图函数中设置键。即
id值
1 20
2 3
3 9
4 36
5 3
(value)(key)in map function
输出将是
关键值
3 5
3 2
9 3
20 1
36 4
map< value,id>输出键/值
reduce< value,id>
如果您希望id位于第一列,这将起作用。
context.write(value,key);
请注意,id不会被排序
I am working with a programme that has 4 MapReduce steps.the output of my first step is:
id value
1 20
2 3
3 9
4 36
I have about 1,000,000 IDs and in the second step i must sort the values.the output of this step:
id value
4 36
1 20
3 9
2 3
How can I sort my data in map reduce? Do I need to use terasort? If yes, how do I use terasort in second step of my programme? Thanks.
If you want to sort according to value's
, make it key in map
function. i.e.
id value
1 20
2 3
3 9
4 36
5 3
(value) (key) in map function
output will be
key value
3 5
3 2
9 3
20 1
36 4
map<value, id> output key/value
reduce <value, id>
if you want id to be in the first column, this will work.
context.write(value, key);
Note that, id's are not going to be sorted
这篇关于如何在地图中排序数据减少hadoop?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!