搜索1GB CSV文件 [英] Search 1GB CSV file

查看:302
本文介绍了搜索1GB CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个CSV文件。每行由相同的格式组成,例如/

I have a CSV file. Each line is made up of the same format eg/

I,h,q,q,3,A,5,Q,3,[,5,Q,8,c,3,N,3,E,4,F,4,g,4,I,V,9000,0000001-100,G9999999990001800000000000001,G9999999990000001100PDNELKKMMCNELRQNWJ010, , , , , , ,D,Z,

我有一个 Dictionary< string,List< char> ;>

通过打开文件,读取每一行,从行中获取元素并将其添加到字典中,该文件被关闭。

It is populated by opening the file, reading each line, taking elements from the line and adding it to the dictionary, then the file is closed.

字典在程序的其他地方使用,它接受输入数据到程序中,然后在字典中找到键,并使用24个元素与输入数据进行比较。

The dictionary is used elsewhere in the program where it accepts input data into the program and then finds the key in the dictionary and uses the 24 elements to compare against the input data.

StreamReader s = File.OpenText(file);
 string lineData = null;
 while ((lineData = s.ReadLine()) != null)
 {
   var elements = lineData.Split(',');
   //Do stuff with elements
   var compareElements = elements.Take(24).Select(x => x[0]);
   FileData.Add(elements[27], new List<char>(compareElements));

  }
  s.Close();

我刚才被告知,CSV文件现在将是800mb,并且大约有800万条记录。我刚刚试图加载到我的双核Win 32位笔记本电脑与4GB的内存调试,它抛出一个 OutOfMemoryException

I have just been told that the CSV file will now be 800mb and have roughly 8 million records in it. I have just tried to load this up on my Dual Core Win 32bit laptop with 4GB of RAM in debug and it threw a OutOfMemoryException.

我现在认为不加载文件到内存将是最好的,但需要找到一种方法来搜索文件快速查看输入数据是否有匹配项等于元素[27] ,然后获取该CSV中的前24个元素,并将其与输入数据进行比较。

I am now thinking that not loading the file into memory will be the best bet but need to find a way to search the file quickly to see if the input data has a matching item equal to element[27] and then take the first 24 elements in that CSV and compare it to the input data.

使用这种方法,使用16GB RAM和Windows 64位将有一个字典中的许多项目是确定吗?

a) Even if I stuck with this approach and used 16GB RAM and Windows 64bit would having that many items in a dictionary be ok?

b)你能提供一些代码/链接的方式来搜索CSV文件,如果你不认为使用字典是一个好的计划

b) Could you provide some code/links to ways to search a CSV file quickly if you dont think using a dictionary is a good plan

更新:虽然我已经接受了一个答案,

UPDATE: Although I have accepted an answer, I just wondered what people's thoughts were on using FileStream to do a lookup and then extract data.

推荐答案

如果你打算搜索这么多记录,我建议将该文件批量插入到DBMS(如SQL Server)中,并为将作为您的条件的字段使用适当的索引,然后使用SQL查询来检查记录的存在。

If you 're planning to search this many records, I would suggest bulk inserting the file into a DBMS like SQL Server with appropriate indices for the fields that will be your criteria, and then using an SQL query to check for the existence of a record.

这篇关于搜索1GB CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆