如何在不加载到内存的情况下对大型csv文件进行排序 [英] How can i sort large csv file without loading to memory

查看:51
本文介绍了如何在不加载到内存的情况下对大型csv文件进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有20GB +的csv文件,如下所示:

I have 20GB+ csv file like this:

**CallId,MessageNo,Information,Number** 
1000,1,a,2
99,2,bs,3
1000,3,g,4
66,2,a,3
20,16,3,b
1000,7,c,4
99,1,lz,4 
...

我必须按asc的CallId和MessageNo顺序订购此文件.(一种方法是加载数据库-> sort-> export)

I must order this file by CallId and MessageNo as asc. (One way is load database->sort->export)

如何在不将所有行加载到C#的内存中的情况下对该文件进行排序?(如使用streamreader逐行显示)

How can i sort this file without loading all lines to memory in c#? (like line by line using streamreader)

您知道用于解决方案的库吗?我等你的建议,谢谢

Do you know a library for solution? i wait your advice, thanks

推荐答案

您应使用OS sort命令.通常只是

You should use OS sort commands. Typically it's just

sort myfile

之后是一些神秘的开关.这些命令通常适用于大文件,并且通常具有用于指定其他物理硬盘驱动器上的临时存储的选项.请参阅此上一个问题,以及Windows sort 命令人工"页面.由于Windows排序不足以解决您的特定排序问题,因此您可能需要使用 GNU coreutils 将linux sort 的功能带到Windows.

followed by some mystical switches. These commands typically work well with large files, and there are often options to specify temporary storage on other physical harddrives. See this previous question, and the Windows sort command "man" page. Since Windows sort is not enough for your particular sorting problem, you may want to use GNU coreutils which bring the power of linux sort to Windows.

这是您需要做的.

  1. 下载 GNU Coreutils Binaries ZIP 并提取 sort.exe 从bin文件夹到计算机上的某个文件夹,例如要排序的文件所在的文件夹.
  2. 下载 GNU Coreutils依赖ZIP 并提取两个.dll 文件与 sort.exe
  3. 放在同一文件夹中
  1. Download GNU Coreutils Binaries ZIP and extract sort.exe from the bin folder to some folder on your machine, for example the folder where your to-be-sorted file is.
  2. Download GNU Coreutils Dependencies ZIP and extract both .dll files to the same folder as sort.exe

现在假设您的文件如下所示:

Now assuming that your file looks like this:

1000,1,a,2
99,2,bs,3
1000,3,g,4
66,2,a,3
20,16,3,b
1000,7,c,4
99,1,lz,4 

您可以在命令提示符下编写:

you can write in the command prompt:

sort.exe yourfile.csv -t, -g

它将输出:

20,16,3,b
66,2,a,3
99,1,lz,4
99,2,bs,3
1000,1,a,2
1000,3,g,4
1000,7,c,4

请参见更多命令选项.如果这是您想要的,请不要忘记为输出文件提供 -o 开关,如下所示:

See more command options here. If this is what you want, don't forget to provide an output file with the -o switch, like so:

sort.exe yourfile.csv -t, -g -o sorted.csv

这篇关于如何在不加载到内存的情况下对大型csv文件进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆