可以禁用hadoop中的排序吗? [英] Is it possible to disable sorting in hadoop?
问题描述
注意:我不能将reducers计数设置为零,因为我需要在许多mapper之间聚合数据。我只对一个reducer的排序结果不感兴趣。
排序映射输出的主要目的之一是,当元组到达reducer时,reducer必须调用reducer任务,通过排序映射输出列表,它可以通过顺序扫描(当它看到不同的键,然后只是创建新列表)时创建列表,如果映射输出没有排序,那么它必须扫描整个列表以形成列表具有相同的键。
My job dosn't require sorting, just aggregation information per key. So I think if it possible to disable sorting of all information in order of increasing performance.
Note: I can't set reducers count to zero because I need to aggregate data between many mappers. I just not interested in sorted result withing one reducer.
One of the main purpose to sort the map output is, when the tuples reaches reducer, reducer has to make ) to invoke reducer task, with the sorted map output list it can make the list just by sequential scan (when it sees different key then just make new list), if the map output is not sorted then it has to scan the whole list to form the list with same key.
这篇关于可以禁用hadoop中的排序吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!