排序大文本数据 [英] sorting large text data

查看：78 发布时间：2020/7/8 10:38:18 python sorting bigdata

本文介绍了排序大文本数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个大文件(制表符分隔值的1亿行-大小约为1.5GB).基于一个字段对它进行排序的最快已知方法是什么?

I have a large file (100 million lines of tab separated values - about 1.5GB in size). What is the fastest known way to sort this based on one of the fields?

我尝试过蜂巢.我想看看是否可以使用python更快地完成.

I have tried hive. I would like to see if this can be done faster using python.

推荐答案

您是否考虑过使用* nix

Have you considered using the *nix sort program? in raw terms, it'll probably be faster than most Python scripts.

使用-t $'\t'指定它是制表符分隔的，使用-k n指定字段，其中n是字段编号，如果要将结果输出到新文件，则使用-o outputfile. 示例:

Use -t $'\t' to specify that it's tab-separated, -k n to specify the field, where n is the field number, and -o outputfile if you want to output the result to a new file. Example:

sort -t $'\t' -k 4 -o sorted.txt input.txt

将在其第4个字段上对input.txt进行排序，并将结果输出到sorted.txt

Will sort input.txt on its 4th field, and output the result to sorted.txt

这篇关于排序大文本数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

排序大文本数据 [英] sorting large text data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

排序大文本数据 [英] sorting large text data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭