解析大CSV文件C#.NET 4 [英] Parsing a big CSV file C# .net 4
问题描述
我知道这个问题已经被问过,但我似乎无法得到它与我读过的答案工作。我有一个CSV文件〜1.2GB,如果我像运行32位,我得到OutOfMemoryException异常的过程中,如果我运行它作为一个64位的过程,但它仍然需要3,4gb在内存中的作品,我知道做我存储了很多的CustomData我类数据,但仍公羊3,4gb?难道我读文件时做错了什么?
字典是一本字典中,我只是有一个映射到哪个属性保存的东西,这取决于它在列。我在做阅读的正确方法?
StreamReader的读者=新的StreamReader(File.OpenRead(路径));
,而(reader.EndOfStream!){
串线= reader.ReadLine();
的String []值= line.Split(';');
数据的CustomData =的CustomData新();
字符串值;
的for(int i = 0; I< values.Length;我++){
dict.TryGetValue(I,超时值);
型TARGETTYPE = data.GetType();
的PropertyInfo道具= targetType.GetProperty(值);
如果(值[I] == NULL)
{
prop.SetValue(数据,NULL,NULL);
}
,否则
{
prop.SetValue(数据,值[I],NULL);
}
}
dataList.Add(数据);
}
有似乎不是什么,错在你的流阅读器的使用,你读内存中的行,那就算了吧。
不过,在C#中的字符串在内存中编码为UTF- 16,平均一个字符占用2个字节的内存。
如果您的CSV还含有大量的将其转换为空场NULL
添加多达7个字节为每个空白字段。
所以,整体来说,因为你基本上所有的数据存储在您的文件记忆,这并不令人惊讶的是,你需要在内存中的文件的大小近3倍。
实际的解决方案是由N行卡盘解析您的数据,对待。他们,并从内存中释放他们。
注意:考虑使用CSV解析器,也不仅仅是以CSV昏迷或分号,如果你的领域之一conatins分号,换行,报价...?
修改
其实每串最多需要20 +(N / 2)*在内存中的4个字节看的 C#中的深度
I know this question has been asked before, but I can't seem to get it working with the answers I've read. I've got a CSV file ~ 1.2GB , If I'm running the process like a 32bit i get outOfMemoryException, it works if i run it as a 64bit process, but it still takes 3,4gb in memory, i do know that I'm storing a lot of data in my customData class, but still 3,4gb of ram?, Am I doing something wrong when reading the file? dict is a dictionary in which i just have a mapping to which property to save something in, depending on the column it's in. Am i doing the reading the right way?
StreamReader reader = new StreamReader(File.OpenRead(path));
while(!reader.EndOfStream) {
String line = reader.ReadLine();
String[] values = line.Split(';');
CustomData data = new CustomData();
string value;
for (int i = 0; i < values.Length; i++) {
dict.TryGetValue(i, out value);
Type targetType = data.GetType();
PropertyInfo prop = targetType.GetProperty(value);
if(values[i]==null)
{
prop.SetValue(data, "NULL",null);
}
else
{
prop.SetValue(data, values[i], null);
}
}
dataList.Add(data);
}
There doesn't seem to be anything wrong in your usage of the stream reader, you read a line in memory, then forget it.
However, in C# a string is encoded in memory as UTF-16 so on the average a character consumes 2 bytes in memory.
If your CSV contains also a lot of empty fields that you convert to "NULL"
you add up to 7 bytes for each empty field.
So on the whole, since you basically store all the data from your file in memory, it's not really surprising that you require almost 3 times the size of the file in memory.
The actual solution is to parse your data by chucks of N lines, treat them, and free them from memory.
Note: Consider using a CSV parser, there is more to CSV than just comas or semi-colons, what if one of your field conatins a semi-colon, a newline, a quote... ?
Edit
Actually each string take up to 20+(N/2)*4 bytes in memory see C# in Depth
这篇关于解析大CSV文件C#.NET 4的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!