如何排序10GB的文件? [英] How can I sort a 10GB file?

查看:234
本文介绍了如何排序10GB的文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对存储在文件中的大表进行排序.该文件的格式为 (ID,intValue)

I'm trying to sort a big table stored in a file. The format of the file is (ID, intValue)

数据按ID排序,但我需要的是使用intValue降序对数据进行排序.

The data is sorted by ID, but what I need is to sort the data using the intValue, in descending order.

例如

ID  | IntValue
1   | 3
2   | 24
3   | 44
4   | 2

此表

ID  | IntValue
3   | 44
2   | 24
1   | 3
4   | 2

如何使用Linux sort命令进行操作?还是您建议另一种方式?

How can I use the Linux sort command to do the operation? Or do you recommend another way?

推荐答案

如何使用Linux sort命令进行操作?还是您建议另一种方式?

How can I use the Linux sort command to do the operation? Or do you recommend another way?

正如其他人已经指出的那样,请参见man sort中的-k& -t命令行选项,说明如何按字符串中的某些特定元素进行排序.

As others have already pointed out, see man sort for -k & -t command line options on how to sort by some specific element in the string.

现在,sort还具有帮助对可能不适合放入RAM的大型文件进行排序的功能.即-m命令行选项,它允许将已排序的文件合并为一个. (有关概念,请参见合并排序.)整个过程相当简单:

Now, the sort also has facility to help sort huge files which potentially don't fit into the RAM. Namely the -m command line option, which allows to merge already sorted files into one. (See merge sort for the concept.) The overall process is fairly straight forward:

  1. 将大文件分成小块.例如,将split工具与-l选项一起使用.例如:

  1. Split the big file into small chunks. Use for example the split tool with the -l option. E.g.:

split -l 1000000 huge-file small-chunk

对较小的文件进行排序.例如

Sort the smaller files. E.g.

for X in small-chunk*; do sort -t'|' -k2 -nr < $X > sorted-$X; done

合并排序的较小文件.例如

Merge the sorted smaller files. E.g.

sort -t'|' -k2 -nr -m sorted-small-chunk* > sorted-huge-file

清理:rm small-chunk* sorted-small-chunk*

您唯一需要特别注意的是列标题.

The only thing you have to take special care about is the column header.

这篇关于如何排序10GB的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆