解析大CSV文件C#.NET 4 [英] Parsing a big CSV file C# .net 4

查看:111
本文介绍了解析大CSV文件C#.NET 4的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这个问题已经被问过,但我似乎无法得到它与我读过的答案工作。我有一个CSV文件〜1.2GB,如果我像运行32位,我得到OutOfMemoryException异常的过程中,如果我运行它作为一个64位的过程,但它仍然需要3,4gb在内存中的作品,我知道做我存储了很多的CustomData我类数据,但仍公羊3,4gb?难道我读文件时做错了什么?
字典是一本字典中,我只是有一个映射到哪个属性保存的东西,这取决于它在列。我在做阅读的正确方法?



  StreamReader的读者=新的StreamReader(File.OpenRead(路径)); 
,而(reader.EndOfStream!){
串线= reader.ReadLine();
的String []值= line.Split(';');
数据的CustomData =的CustomData新();
字符串值;
的for(int i = 0; I< values.Length;我++){
dict.TryGetValue(I,超时值);
型TARGETTYPE = data.GetType();
的PropertyInfo道具= targetType.GetProperty(值);
如果(值[I] == NULL)
{
prop.SetValue(数据,NULL,NULL);
}
,否则
{
prop.SetValue(数据,值[I],NULL);
}

}
dataList.Add(数据);
}


解决方案

有似乎不是什么,错在你的流阅读器的使用,你读内存中的行,那就算了吧。



不过,在C#中的字符串在内存中编码为UTF- 16,平均一个字符占用2个字节的内存。



如果您的CSV还含有大量的将其转换为空场NULL 添加多达7个字节为每个空白字段。



所以,整体来说,因为你基本上所有的数据存储在您的文件记忆,这并不令人惊讶的是,你需要在内存中的文件的大小近3倍。



实际的解决方案是由N行卡盘解析您的数据,对待。他们,并从内存中释放他们。



注意:考虑使用CSV解析器,也不仅仅是以CSV昏迷或分号,如果你的领域之一conatins分号,换行,报价...?



修改



其实每串最多需要20 +(N / 2)*在内存中的4个字节看的 C#中的深度


I know this question has been asked before, but I can't seem to get it working with the answers I've read. I've got a CSV file ~ 1.2GB , If I'm running the process like a 32bit i get outOfMemoryException, it works if i run it as a 64bit process, but it still takes 3,4gb in memory, i do know that I'm storing a lot of data in my customData class, but still 3,4gb of ram?, Am I doing something wrong when reading the file? dict is a dictionary in which i just have a mapping to which property to save something in, depending on the column it's in. Am i doing the reading the right way?

StreamReader reader = new StreamReader(File.OpenRead(path));
while(!reader.EndOfStream)  {
            String line = reader.ReadLine();
            String[] values = line.Split(';');
            CustomData data = new CustomData();
            string value;
            for (int i = 0; i < values.Length; i++) {
                dict.TryGetValue(i, out value);
                Type targetType = data.GetType();
                PropertyInfo prop = targetType.GetProperty(value);
                if(values[i]==null)
                {
                    prop.SetValue(data, "NULL",null);
                }
                else
                {
                    prop.SetValue(data, values[i], null);
                }

            }
            dataList.Add(data);
        }

解决方案

There doesn't seem to be anything wrong in your usage of the stream reader, you read a line in memory, then forget it.

However, in C# a string is encoded in memory as UTF-16 so on the average a character consumes 2 bytes in memory.

If your CSV contains also a lot of empty fields that you convert to "NULL" you add up to 7 bytes for each empty field.

So on the whole, since you basically store all the data from your file in memory, it's not really surprising that you require almost 3 times the size of the file in memory.

The actual solution is to parse your data by chucks of N lines, treat them, and free them from memory.

Note: Consider using a CSV parser, there is more to CSV than just comas or semi-colons, what if one of your field conatins a semi-colon, a newline, a quote... ?

Edit

Actually each string take up to 20+(N/2)*4 bytes in memory see C# in Depth

这篇关于解析大CSV文件C#.NET 4的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆