C#搜索大文本文件 [英] c# searching large text file

查看:272
本文介绍了C#搜索大文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试优化大型文本文件(300-600mb)中字符串的搜索.使用我当前的方法,它花费的时间太长了.

目前,我一直在使用IndexOf搜索字符串,但是,为字符串的每一行建立索引所花费的时间太长(20秒).

如何优化搜索速度?我已经尝试过Contains(),但这也很慢.有什么建议?我当时在考虑正则表达式的匹配,但我认为速度没有明显提高.也许我的搜索逻辑有缺陷

示例

while ((line = myStream.ReadLine()) != null)
{
    if (line.IndexOf(CompareString, StringComparison.OrdinalIgnoreCase) >= 0)
    {
        LineIndex.Add(CurrentPosition);
        LinesCounted += 1;
    }
}

解决方案

您正在使用的蛮力算法在 O(nm)时间内执行,其中 n 为搜索的字符串的长度和 m 您要查找的子字符串/模式的长度.您需要使用字符串搜索算法:

但是,根据要查找的内容,使用精心设计的正则表达式可能就足够了.参见 Jeffrey's Friedl 的书集, 算法 ="http://www.cs.princeton.edu/~rs/" rel ="noreferrer">各种化身( [C | C ++ | Java]中的算法)

I am trying to optimize the search for a string in a large text file (300-600mb). Using my current method, it is taking too long.

Currently I have been using IndexOf to search for the string, but the time it takes is way too long (20s) to build an index for each line with the string.

How can I optimize searching speed? I've tried Contains() but that is slow as well. Any suggestions? I was thinking regex match but I don't see that having a significant speed boost. Maybe my search logic is flawed

example

while ((line = myStream.ReadLine()) != null)
{
    if (line.IndexOf(CompareString, StringComparison.OrdinalIgnoreCase) >= 0)
    {
        LineIndex.Add(CurrentPosition);
        LinesCounted += 1;
    }
}

解决方案

The brute force algorithm you're using performs in O(nm) time, where n is the length of the string being searched and m the length of the substring/pattern you're trying to find. You need to use a string search algorithm:

However, using a regular expression crafted with care might be sufficient, depending on what you are trying to find. See Jeffrey's Friedl's tome, Mastering Regular Expressions for help on building efficient regular expressions (e.g., no backtracking).

You might also want to consult a good algorithms text. I'm partial to Robert Sedgewick's Algorithms in its various incarnations (Algorithms in [C|C++|Java])

这篇关于C#搜索大文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆