一行行读一个大的文本文件,并搜索字符串 [英] Read line by line a large text file and search for a string

查看:235
本文介绍了一行行读一个大的文本文件,并搜索字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在开发读取约50000行的文本文件的应用程序。对于每一行,我需要检查它是否包含一个特定的字符串。

I am currently developing an application that reads a text file of about 50000 lines. For each line, I need to check if it contains a specific String.

目前,我用的是传统的就是System.IO.StreamReader 来逐行读取我的文件行。

At the moment, I use the conventional System.IO.StreamReader to read my file line by line.

的问题是,该文本文件的大小,每一次变化。我提出了几项测试中的表现,我注意到,当文件大小的增加,更多的时间,将采取读取行

The problem is that the size of the text file changes each time. I made several test performance and I noticed that when the file size increase, the more time it will take to read a line.

阅读包含5000行的TXT文件:0:40结果
读取包含10000行的txt文件:2:54

Reading a txt file that contains 5000 lines : 0:40
Reading a txt file that contains 10000 lines : 2:54

这需要更长的时间的4倍读取文件的2倍。我无法想象它会需要多少时间阅读10万株文件

It take 4 times longer to read a file 2 times larger. I can't imagine how much time it will takes to read a 100000 lines file.

下面是我的代码:

using (StreamReader streamReader = new StreamReader(this.MyPath))
{
     while (streamReader.Peek() > 0)
     {
          string line = streamReader.ReadLine();

          if (line.Contains(Resources.Constants.SpecificString)
          {
               // Do some action with the string.
          }
     }
}

有没有办法避免的情况:大文件=更多的时间来读取单个?行

Is there a way to avoid the situation: bigger File = more time to read a single line?

推荐答案

试试这个:

var toSearch = Resources.Constants.SpecificString;
foreach (var str in File.ReadLines(MyPath).Where(s => s.Contains(toSearch))) {
    // Do some action with the string
}

这避免了对缓存每次迭代访问资源。循环前值。如果这没有帮助,尝试写自己的包含基于先进的字符串搜索算法,如的 KMP

This avoids accessing the resources on each iteration by caching value before the loop. If this does not help, try writing your own Contains based on an advanced string searching algorithm, such as the KMP.

请注意:一定要使用的 File.ReadLines 它懒洋洋地读取线(不像同样期待 File.ReadAllLines 读取一次都行)。

Note: be sure to use File.ReadLines which reads lines lazily (unlike similarly looking File.ReadAllLines that reads all lines at once).

这篇关于一行行读一个大的文本文件,并搜索字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆