Hadoop和Python:禁用排序 [英] Hadoop and Python: Disable Sorting
问题描述
我已经意识到,当使用Python代码运行Hadoop时,无论是mapper还是reducer(不确定是哪个)在输出之前都会对输出进行排序,然后由 reducer.py 打印输出。目前它似乎按照字母数字排序。我想知道是否有一种方法可以完全禁用此功能。我希望程序的输出基于从 mapper.py 打印的顺序。我在Java中找到了答案,但是没有找到Python的答案。我需要修改 mapper.py 或者命令行参数吗?
否,如 此处如果您的reduce任务数量不是0,那么hadoop框架将对您的结果进行排序。: :这是没有办法的。
I've realized that when running Hadoop with Python code, either the mapper or reducer (not sure which) is sorting my output before it's printed out by reducer.py. Currently it seems to be sorted alphanumerically. I am wondering if there is a way to completely disable this. I would like the output of the program based off of the order in which it's printed from mapper.py. I've found answers in Java but none for Python. Would I need to modify mapper.py or perhaps the command line arguments?
No, as stated here:
If your number of reduce tasks is not 0, the hadoop framework will sort your results. There is no way around it.
这篇关于Hadoop和Python:禁用排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!