在C#中读取14 GB文件 [英] Read 14 GB file in C#

查看:86
本文介绍了在C#中读取14 GB文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,



我正在尝试读取14 GB文件,如果该文件行中的任何一行包含单词NULL我将写入该特定行单独的文本文件,下面是我试过的代码。



我的文件看起来像这样



ID | F_NAME | MIDDLE_NAME | L_NAME

1 | PRADEEP | NULL | KUMAR



这里我想找到NULL。实际问题是文件大小约为14 Gb,我尝试使用ReadAllText(),Streamreader.ReadLine()都抛出了内存异常。有没有办法可以完成?



立即帮助赞赏!



谢谢



我尝试了什么:



Hello Every one,

I am trying to read 14 GB file, if any of that file line contains word "NULL" i would writing that particular line in separate text file, below is the code what i have tried.

my file looks like this

ID|F_NAME|MIDDLE_NAME|L_NAME
1|PRADEEP|NULL|KUMAR

Here i wanna find NULL. Actual problem is file size its around 14 Gb, i tried using ReadAllText(),Streamreader.ReadLine() both throwin me Memory out exception. Is there a way I can accomplish?

Immediate help appreciated!

Thanks

What I have tried:

using (FileStream fs = File.Open(Sources_path + "\\" + Filename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
                    using (BufferedStream bs = new BufferedStream(fs))




using (StreamReader sr = new StreamReader(bs))
                    {
                        
                        sr.ReadToEnd();



}


}

推荐答案

您别无选择,只能一次读取一行文件。您不能使用ReadAllLines或类似的东西,因为它会尝试在字符串数组中将ENTIRE FILE读入内存。除非你碰巧机器中有大约30GB的内存,否则你将无法读取该文件。



此外,阵列限制为2.47十亿条目。如果你的文件行数超过了这个数量,你仍然无法一次性完整阅读它。



你必须阅读文件,一次一行,并在读取时处理每一行。然后,您不需要在机器中拥有大量内存。你只需要足够的内存来读取文件的一行。

You have no choice but to read the file one line at a time. You can NOT use ReadAllLines, or anything like it, because it will try to read the ENTIRE FILE into memory in an array of strings. Unless you happen to have about 30GB of ram in the machine, you're not going to be able to read the file.

Also, an array is limited to 2.47-ish billion entries. If you've got more than that in the number of lines int he file, you still can't read it in its entirety all at once.

You MUST read the file, one line at a time, and process each line as you read it. You then don't need to have tons of memory in the machine. You just need enough memory to read a single line of the file.
using (StreamReader sw = new StreamReader("filepath")
{
    string line = sw.ReadLine();
    ... process your line data ...
}


第一个解决方案:添加大量内存。请记住,文件可能会增长,并且您还需要空间来生成文件。



第二个解决方案:逐行读取文件。

First solution: Add an insane amount of memory. Remember, the file is likely to grow and you also need space for the resulting file.

Second solution: Read the file line by line.
引用:

Streamreader.ReadLine()抛出异常内存。

Streamreader.ReadLine() throwing me Memory out exception.

除非你也尝试将文件存储在内存中,否则不可能。想一想:你需要将整个文件存储在内存中吗?

1行包含足够的信息来说明如何处理它。



第三种解决方案:此文件可能来自数据库。直接查询数据库中的NULL会更有效。

Impossible unless you also try to store the file in memory. Think about it: Do you need to store the whole file in memory ?
1 line contain enough information to tell what to do with it.

Third solution: this file is likely to come from a database. Querying directly the database for NULLs would be more efficient.


你想逐行读取它。另外,我认为你可能会过度思考它。忘记缓冲。当您看到数据时,它已被HDD磁盘控制器,操作系统驱动程序和运行时缓冲,因此您的缓冲担忧已经结束。像其他人建议的那样使用StreamReader.ReadLine()。您会发现,对于非常大量的数据,文件系统要快得多。 RAM原则上更快,但实际上如果你让操作系统进入分页文件,那么你将无法在你的生命周期内完成。这是Tortoise击败野兔的版本。
You want to read it line by line. Also, I think you may be over-thinking it a bit. Forget about buffering. By the time you see the data it's already been buffered by the HDD on-disk controller, the OS driver, and the runtime, so your buffering worries are over. Use StreamReader.ReadLine() as others have suggested. You will find that for very large quantities of data like that, the file-system is much, much faster. RAM is faster in principle, but in practice if you make the OS go into the paging file, then you won't finish in your lifetime. This is the version where the Tortoise beats the Hare.


这篇关于在C#中读取14 GB文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆