.NET中处理大型csv的最有效方法 [英] Most efficient way to process a large csv in .NET

查看:89
本文介绍了.NET中处理大型csv的最有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请原谅我的无礼,但我只需要一些指导,就找不到另一个可以回答这个问题的问题.我有一个相当大的csv文件(约30万行),我需要确定给定输入的csv中是否有任何行以该输入开头.我已经按字母顺序对csv进行了排序,但是我不知道:

Forgive my noobiness but I just need some guidance and I can't find another question that answers this. I have a fairly large csv file (~300k rows) and I need to determine for a given input, whether any line in the csv begins with that input. I have sorted the csv alphabetically, but I don't know:

1)如何处理csv中的行,应该以列表/集合的形式读取它,还是使用OLEDB,嵌入式数据库或其他内容?

1) how to process the rows in the csv- should I read it in as a list/collection, or use OLEDB, or an embedded database or something else?

2)如何有效地从字母列表中查找内容(利用它被排序以加快处理速度的事实,而不是搜索整个列表)

2) how to find something efficiently from an alphabetical list (using the fact that it's sorted to speed things up, rather than searching the whole list)

推荐答案

您提供的具体信息不足以提供具体答案,但是...

You don't give enough specifics to give you a concrete answer but...

如果CSV文件经常更改,则使用OLEDB,然后根据您的输入更改SQL查询.

IF the CSV file changes often then use OLEDB and just change the SQL query based on your input.

string sql = @"SELECT * FROM [" + fileName + "] WHERE Column1 LIKE 'blah%'";
using(OleDbConnection connection = new OleDbConnection(
          @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + fileDirectoryPath + 
          ";Extended Properties=\"Text;HDR=" + hasHeaderRow + "\""))


如果CSV文件不经常更改,并且您对它运行了很多查询",则将其加载到内存中一次,然后每次都进行快速搜索.


IF the CSV file doesn't change often and you run a lot of "queries" against it, load it once into memory and quickly search it each time.

如果您希望搜索与某列完全匹配,请使用Dictionary,其中键是您要匹配的列,而值是行数据.

IF you want your search to be an exact match on a column use a Dictionary where the key is the column you want to match on and the value is the row data.

Dictionary<long, string> Rows = new Dictionar<long, string>();
...
if(Rows.ContainsKey(search)) ...

如果您希望搜索像StartsWith这样的部分匹配,则有一个包含可搜索数据的数组(即第一列)和另一个包含行数据的列表或数组.然后使用C#内置的二进制搜索 http://msdn.microsoft.com/zh-我们/library/2cy9f6wb.aspx

IF you want your search to be a partial match like StartsWith then have 1 array containing your searchable data (ie: first column) and another list or array containing your row data. Then use C#'s built in binary search http://msdn.microsoft.com/en-us/library/2cy9f6wb.aspx

string[] SortedSearchables = new string[];
List<string> SortedRows = new List<string>();
...
string result = null;
int foundIdx = Array.BinarySearch<string>(SortedSearchables, searchTerm);
if(foundIdx < 0) {
    foundIdx = ~foundIdx;
    if(foundIdx < SortedRows.Count && SortedSearchables[foundIdx].StartsWith(searchTerm)) {
        result = SortedRows[foundIdx];
    }
} else {
    result = SortedRows[foundIdx];
}

注意代码是在浏览器窗口内编写的,可能包含语法错误,因为未经测试.

NOTE code was written inside the browser window and may contain syntax errors as it wasn't tested.

这篇关于.NET中处理大型csv的最有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆