排序大文件(10G) [英] Sorting big file (10G)
问题描述
我正在尝试对存储在文件中的大表进行排序.该文件的格式为 (ID,intValue)
I'm trying to sort a big table stored in a file. The format of the file is (ID, intValue)
数据按ID
排序,但是我需要的是使用intValue
降序对数据进行排序.
The data is sorted by ID
, but what I need is to sort the data using the intValue
, in descending order.
例如
ID | IntValue
1 | 3
2 | 24
3 | 44
4 | 2
此表
ID | IntValue
3 | 44
2 | 24
1 | 3
4 | 2
如何使用Linux sort
命令进行操作?还是您建议另一种方式?
How can I use the Linux sort
command to do the operation? Or do you recommend another way?
推荐答案
如何使用Linux sort命令进行操作?还是您建议另一种方式?
How can I use the Linux sort command to do the operation? Or do you recommend another way?
正如其他人已经指出的那样,请参见man sort
中的-k
& -t
命令行选项,说明如何按字符串中的某些特定元素进行排序.
As others have already pointed out, see man sort
for -k
& -t
command line options on how to sort by some specific element in the string.
现在,sort
还具有帮助对可能不适合放入RAM的大型文件进行排序的功能.即-m
命令行选项,它允许将已排序的文件合并为一个. (有关概念,请参见合并排序.)整个过程相当简单:
Now, the sort
also has facility to help sort huge files which potentially don't fit into the RAM. Namely the -m
command line option, which allows to merge already sorted files into one. (See merge sort for the concept.) The overall process is fairly straight forward:
-
将大文件分成小块.例如,将
split
工具与-l
选项一起使用.例如:
Split the big file into small chunks. Use for example the
split
tool with the-l
option. E.g.:
split -l 1000000 huge-file small-chunk
对较小的文件进行排序.例如
Sort the smaller files. E.g.
for X in small-chunk*; do sort -t'|' -k2 -nr < $X > sorted-$X; done
合并排序的较小文件.例如
Merge the sorted smaller files. E.g.
sort -t'|' -k2 -nr -m sorted-small-chunk* > sorted-huge-file
清理:rm small-chunk* sorted-small-chunk*
您唯一需要特别注意的是列标题.
The only thing you have to take special care about is the column header.
这篇关于排序大文件(10G)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!