Map-Reduce/Hadoop按整数值排序(使用MRJob) [英] Map-Reduce/Hadoop sort by integer value (using MRJob)

查看：144 发布时间：2020/5/5 15:41:06 python sorting hadoop mapreduce mrjob

本文介绍了Map-Reduce/Hadoop按整数值排序(使用MRJob)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是一个简单的Map-Reduce排序功能的MRJob实现.在beta.py:

This is an MRJob implementation of a simple Map-Reduce sorting functionality. In beta.py:

from mrjob.job import MRJob

class Beta(MRJob):
    def mapper(self, _, line):
        """
        """
        l = line.split(' ')
        yield l[1], l[0]

    def reducer(self, key, val):
        yield key, [v for v in val][0]


if __name__ == '__main__':
    Beta.run()

我使用以下文字运行它:

I run it using the text:

一个人可以使用以下命令来运行它:

One can run this using:

cat <filename> | python beta.py

现在的问题是，假设键的类型为string，则对输出进行排序(在这里可能就是这种情况).输出为:

Now the issue is the output is sorted assuming that the key is of type string (which is probably the case here). The output is:

"1"     "1"
"10"    "6"
"11"    "7"
"2"     "4"
"4"     "2"
"5"     "5"
"7"     "4"
"8"     "3"

我想要的输出是:

"1"     "1"
"2"     "4"
"4"     "2"
"5"     "5"
"7"     "4"
"8"     "3"
"10"    "6"
"11"    "7"

我不确定这是否与MRJob中的协议摆弄有关，因为协议是特定于工作的，而不是特定于步骤的.

I am not sure if this is to do with fiddling with protocols in MRJob as protocols are job specific and not step specific.

编辑(解决方案):我已经找到了答案.这个想法是，每个数字都必须以'O-bytes'开头，以便每个数字中的字节数与最大数字中的字节数相同.至少那是我在课堂上记得的东西.我现在无法添加答案，因为它不允许我这样做，但这是我唯一的解决方案.如果有人能获得更透明，更轻松的信息，请分享.

EDIT (Solution): I have got the answer for this one. The idea is that one needs to prepend 'O-bytes' to every number such that the number of bytes in every number is same the number of bytes in the largest number. At least that's what I remembered from my classes. I cannot add the answer right now as it won't permit me but this is the only solution I've got. If anyone's got something more transparent and easy, please share.

Map-Reduce/Hadoop按整数值排序(使用MRJob) [英] Map-Reduce/Hadoop sort by integer value (using MRJob)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Map-Reduce/Hadoop按整数值排序(使用MRJob) [英] Map-Reduce/Hadoop sort by integer value (using MRJob)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭