Hadoop和Python:禁用排序 [英] Hadoop and Python: Disable Sorting

查看:128
本文介绍了Hadoop和Python:禁用排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经意识到,当使用Python代码运行Hadoop时,无论是mapper还是reducer(不确定是哪个)在输出之前都会对输出进行排序,然后由 reducer.py 打印输出。目前它似乎按照字母数字排序。我想知道是否有一种方法可以完全禁用此功能。我希望程序的输出基于从 mapper.py 打印的顺序。我在Java中找到了答案,但是没有找到Python的答案。我需要修改 mapper.py 或者命令行参数吗?

解决方案

,如 此处如果您的reduce任务数量不是0,那么hadoop框架将对您的结果进行排序。这是没有办法的。



I've realized that when running Hadoop with Python code, either the mapper or reducer (not sure which) is sorting my output before it's printed out by reducer.py. Currently it seems to be sorted alphanumerically. I am wondering if there is a way to completely disable this. I would like the output of the program based off of the order in which it's printed from mapper.py. I've found answers in Java but none for Python. Would I need to modify mapper.py or perhaps the command line arguments?

解决方案

No, as stated here:

If your number of reduce tasks is not 0, the hadoop framework will sort your results. There is no way around it.

这篇关于Hadoop和Python:禁用排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆