为什么MRJob对我的钥匙进行排序? [英] Why is MRJob sorting my keys?

查看:98
本文介绍了为什么MRJob对我的钥匙进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个相当大的MRJob作业(1,755,638个键),并且这些键正按排序顺序写入到reducer中.即使我指定Hadoop应该使用哈希分区程序,也会发生这种情况,

I'm running a fairly big MRJob job (1,755,638 keys) and the keys are being written to the reducers in sorted order. This happens even if I specify that Hadoop should use the hash partitioner, with:

class SubClass(MRJob):

    PARTITIONER = "org.apache.hadoop.mapred.lib.HashPartitioner"

...

当我不要求对键进行排序时,我不明白为什么对键进行排序.

I don't understand why the keys are sorted, when I am not asking for them to be sorted.

推荐答案

默认情况下不对键进行排序,但是如果数据集较小,则HashPartitioner将给出排序键的外观.当我将数据集的大小从50M增加到10G时,密钥不再排序.

Keys are not sorted by default, but the HashPartitioner will give the appearance of sorting keys if the dataset is small. When I increased the size of the dataset from 50M to 10G the keys stopped being sorted.

这篇关于为什么MRJob对我的钥匙进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆