解析大CSV文件C＃.NET 4 [英] Parsing a big CSV file C# .net 4

查看：111 发布时间：2016/10/4 14:45:22 c#

本文介绍了解析大CSV文件C＃.NET 4的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道这个问题已经被问过，但我似乎无法得到它与我读过的答案工作。我有一个CSV文件〜1.2GB，如果我像运行32位，我得到OutOfMemoryException异常的过程中，如果我运行它作为一个64位的过程，但它仍然需要3,4gb在内存中的作品，我知道做我存储了很多的CustomData我类数据，但仍公羊3,4gb？难道我读文件时做错了什么？
字典是一本字典中，我只是有一个映射到哪个属性保存的东西，这取决于它在列。我在做阅读的正确方法？

  StreamReader的读者=新的StreamReader（File.OpenRead（路径））; 
，而（reader.EndOfStream！）{
串线= reader.ReadLine（）; 
的String []值= line.Split（';'）; 
数据的CustomData =的CustomData新（）; 
字符串值; 
的for（int i = 0; I< values.Length;我++）{
 dict.TryGetValue（I，超时值）; 
型TARGETTYPE = data.GetType（）; 
的PropertyInfo道具= targetType.GetProperty（值）; 
如果（值[I] == NULL）
 {
 prop.SetValue（数据，NULL，NULL）; 
} 
，否则
 {
 prop.SetValue（数据，值[I]，NULL）; 
} 
 
} 
 dataList.Add（数据）; 
}

解决方案

有似乎不是什么，错在你的流阅读器的使用，你读内存中的行，那就算了吧。

不过，在C＃中的字符串在内存中编码为UTF- 16，平均一个字符占用2个字节的内存。

如果您的CSV还含有大量的将其转换为空场NULL 添加多达7个字节为每个空白字段。

所以，整体来说，因为你基本上所有的数据存储在您的文件记忆，这并不令人惊讶的是，你需要在内存中的文件的大小近3倍。

实际的解决方案是由N行卡盘解析您的数据，对待。他们，并从内存中释放他们。

注意：考虑使用CSV解析器，也不仅仅是以CSV昏迷或分号，如果你的领域之一conatins分号，换行，报价...？

修改

其实每串最多需要20 +（N / 2）*在内存中的4个字节看的 C＃中的深度

I know this question has been asked before, but I can't seem to get it working with the answers I've read. I've got a CSV file ~ 1.2GB , If I'm running the process like a 32bit i get outOfMemoryException, it works if i run it as a 64bit process, but it still takes 3,4gb in memory, i do know that I'm storing a lot of data in my customData class, but still 3,4gb of ram?, Am I doing something wrong when reading the file? dict is a dictionary in which i just have a mapping to which property to save something in, depending on the column it's in. Am i doing the reading the right way?

StreamReader reader = new StreamReader(File.OpenRead(path));
while(!reader.EndOfStream)  {
            String line = reader.ReadLine();
            String[] values = line.Split(';');
            CustomData data = new CustomData();
            string value;
            for (int i = 0; i < values.Length; i++) {
                dict.TryGetValue(i, out value);
                Type targetType = data.GetType();
                PropertyInfo prop = targetType.GetProperty(value);
                if(values[i]==null)
                {
                    prop.SetValue(data, "NULL",null);
                }
                else
                {
                    prop.SetValue(data, values[i], null);
                }

            }
            dataList.Add(data);
        }

解决方案

There doesn't seem to be anything wrong in your usage of the stream reader, you read a line in memory, then forget it.

However, in C# a string is encoded in memory as UTF-16 so on the average a character consumes 2 bytes in memory.

If your CSV contains also a lot of empty fields that you convert to "NULL" you add up to 7 bytes for each empty field.

So on the whole, since you basically store all the data from your file in memory, it's not really surprising that you require almost 3 times the size of the file in memory.

The actual solution is to parse your data by chucks of N lines, treat them, and free them from memory.

Note: Consider using a CSV parser, there is more to CSV than just comas or semi-colons, what if one of your field conatins a semi-colon, a newline, a quote... ?

Edit

Actually each string take up to 20+(N/2)*4 bytes in memory see C# in Depth

这篇关于解析大CSV文件C＃.NET 4的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解析大CSV文件C＃.NET 4 [英] Parsing a big CSV file C# .net 4

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

解析大CSV文件C＃.NET 4 [英] Parsing a big CSV file C# .net 4

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭