为什么MRJob对我的钥匙进行排序? [英] Why is MRJob sorting my keys?
本文介绍了为什么MRJob对我的钥匙进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在运行一个相当大的MRJob作业(1,755,638个键),并且这些键正按排序顺序写入到reducer中.即使我指定Hadoop应该使用哈希分区程序,也会发生这种情况,
I'm running a fairly big MRJob job (1,755,638 keys) and the keys are being written to the reducers in sorted order. This happens even if I specify that Hadoop should use the hash partitioner, with:
class SubClass(MRJob):
PARTITIONER = "org.apache.hadoop.mapred.lib.HashPartitioner"
...
当我不要求对键进行排序时,我不明白为什么对键进行排序.
I don't understand why the keys are sorted, when I am not asking for them to be sorted.
推荐答案
默认情况下不对键进行排序,但是如果数据集较小,则HashPartitioner将给出排序键的外观.当我将数据集的大小从50M增加到10G时,密钥不再排序.
Keys are not sorted by default, but the HashPartitioner will give the appearance of sorting keys if the dataset is small. When I increased the size of the dataset from 50M to 10G the keys stopped being sorted.
这篇关于为什么MRJob对我的钥匙进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文