.NET的System.OutOfMemoryException 120 MB CSV文件String.Split() [英] .NET System.OutOfMemoryException on String.Split() of 120 MB CSV file

查看:421
本文介绍了.NET的System.OutOfMemoryException 120 MB CSV文件String.Split()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用C#读取〜120 MB纯文本的CSV文件。起初,我做了解析按行由行读它,但最近发现读取整个文件内容加载到内存第一更快多次。解析已经是相当缓慢的,因为CSV有逗号嵌入引号内,这意味着我必须使用正则表达式分裂。这是唯一一个我发现,工作可靠:

I am using C# to read a ~120 MB plain-text CSV file. Initially I did the parsing by reading it line-by-line, but recently determined that reading the entire file contents into memory first was multiple times faster. The parsing is already quite slow because the CSV has commas embedded inside quotes, which means I have to use a regex split. This is the only one I have found that works reliably:

string[] fields = Regex.Split(line, 
@",(?!(?<=(?:^|,)\s*\x22(?:[^\x22]|\x22\x22|\\\x22)*,)
(?:[^\x22]|\x22\x22|\\\x22)*\x22\s*(?:,|$))");
// from http://regexlib.com/REDetails.aspx?regexp_id=621

为了读取整个内容加载到内存后做了分析,我做了换行字符的字符串拆分获得包含每一行的数组。然而,当我这样做对120 MB的文件,我得到一个的System.OutOfMemoryException 。为什么会出现内存不足的这么快,当我的电脑有4 GB的内存?有没有更好的方式来快速分析复杂的CSV?

In order to do the parsing after reading the entire contents into memory, I do a string split on the newline character to get an array containing each line. However, when I do this on the 120 MB file, I get a System.OutOfMemoryException. Why does it run out of memory so quickly when my computer has 4 GB of RAM? Is there a better way to quickly parse a complicated CSV?

推荐答案

您可以得到一个OutOfMemoryException基本上为任何规模大小分配。当您分配一块内存你真的要求一个连续件的要求大小的内存。如果无法兑现,你会看到一个OutOfMemoryException。

You can get an OutOfMemoryException for basically any size of allocation. When you allocate a piece of memory you're really asking for a continuous piece of memory of the requested size. If that cannot be honored you'll see an OutOfMemoryException.

您也应该知道,除非你正在运行64位Windows,您4 GB RAM被分成2 GB内核空间和2 GB的用户空间,让你的.NET应用程序不能访问每个默认超过2 GB。

You should also be aware that unless you're running 64 bit Windows, your 4 GB RAM is split into 2 GB kernel space and 2 GB user space, so your .NET application cannot access more that 2 GB per default.

在做字符串操作在.NET中你可能会创建大量的临时字符串,由于这一事实,即.NET字符串是不可变的。因此,你可能会看到内存使用率上升相当显着。

When doing string operations in .NET you risk creating a lot of temporary strings due to the fact that .NET strings are immutable. Therefore you may see memory usage rise quite dramatically.

这篇关于.NET的System.OutOfMemoryException 120 MB CSV文件String.Split()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆