读大的文本文件在C#中的流 [英] Reading large text files with streams in C#

查看:330
本文介绍了读大的文本文件在C#中的流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有工作如何被加载到我们的应用程序的脚本编辑器来处理大文件的可爱的任务(它像的 VBA 了解我们的内部产品快速宏)。大多数文件都是300-400  KB这是很好的加载。但是,当他们超越100 NBSP; MB的过程中有一个硬的时间(如你所期望)

I've got the lovely task of working out how to handle large files being loaded into our application's script editor (it's like VBA for our internal product for quick macros). Most files are about 300-400 KB which is fine loading. But when they go beyond 100 MB the process has a hard time (as you'd expect).

什么情况是,该文件被读取并推到一个RichTextBox,然后导航 - 不要太担心这部分

What happens is that the file is read and shoved into a RichTextBox which is then navigated - don't worry too much about this part.

谁写的初始code为只需使用一个StreamReader和做开发人员

The developer who wrote the initial code is simply using a StreamReader and doing

[Reader].ReadToEnd()

这可能需要相当一段时间才能完成。

which could take quite a while to complete.

我的任务是打破code了该位,念给块到缓冲区,并显示一个进度条,同时可以选择取消。

My task is to break this bit of code up, read it in chunks into a buffer and show a progressbar with an option to cancel it.

一些假设:

  • 在大多数文件将是30-40 NBSP; MB
  • 在该文件的内容是文本(不是二进制),有些是Unix的格式,有些是DOS。
  • 一旦内容被检索,我们制定出什么终止使用。
  • 在没有人的,一旦它的加载它需要在RichTextBox的渲染时间有关。这是文本的只是初始加载。

现在的问题:

  • 我可以简单地使用的StreamReader,然后检查长度属性(所以ProgressMax),并发出读取一个缓冲的大小和遍历while循环的住宿方式的内部的后台工作,所以它不T阻塞主UI线程?然后,一旦它完成了StringBuilder的返回主线程。
  • 的内容将去一个StringBuilder。我可以初始化与流的大小的StringBuilder如果长度可用?
  • Can I simply use StreamReader, then check the Length property (so ProgressMax) and issue a Read for a set buffer size and iterate through in a while loop WHILST inside a background worker, so it doesn't block the main UI thread? Then return the stringbuilder to the main thread once it's completed.
  • The contents will be going to a StringBuilder. can I initialise the StringBuilder with the size of the stream if the length is available?

这些是(在你的专业意见)好点子?我已经在过去的几个问题与流读取内容,因为它会永远怀念的最后几个字节或东西,但我会问另外一个问题,如果是这样的话。

Are these (in your professional opinions) good ideas? I've had a few issues in the past with reading content from Streams, because it will always miss the last few bytes or something, but I'll ask another question if this is the case.

推荐答案

您可以通过使用BufferedStream,像这样提高读取速度:

You can improve read speed by using a BufferedStream, like this:

using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {

    }
}

2013年3月更新

我最近写了$ C $下读取和处理(搜索文本)1  GB上下的文本文件(比这里涉及的文件大很多),并通过使用生产者/消费者模式取得了显著的性能增益。通过阅读文本行的生产任务 BufferedStream 递给他们开了一个单独的消费者的任务做了搜索。

I recently wrote code for reading and processing (searching for text in) 1 GB-ish text files (much larger than the files involved here) and achieved a significant performance gain by using a producer/consumer pattern. The producer task read in lines of text using the BufferedStream and handed them off to a separate consumer task that did the searching.

我也以此为契机,学习TPL数据流,这是非常适合于快速编码这种模式。

I used this as an opportunity to learn TPL Dataflow, which is very well suited for quickly coding this pattern.

为什么BufferedStream更快

一个缓冲器是用于高速缓存数据,从而减少调用操作系统的数目在存储器字节块。缓冲器来提高读取和写入性能。缓冲器可用于读出或写入,但从来没有同时进行。 BufferedStream的读取和写入方法自动维护缓冲区。

A buffer is a block of bytes in memory used to cache data, thereby reducing the number of calls to the operating system. Buffers improve read and write performance. A buffer can be used for either reading or writing, but never both simultaneously. The Read and Write methods of BufferedStream automatically maintain the buffer.

2014年12月更新:您的情况可能不同

文件流应该使用 BufferedStream 内部。在第一次提供这个答案的时候,我测量的显著的性能提升通过添加BufferedStream。当时我是针对在32位平台的.NET 3.x的。今天,在64位平台针对.NET 4.5,我看不出有任何改善。

Based on the comments, FileStream should be using a BufferedStream internally. At the time this answer was first provided, I measured a significant performance boost by adding a BufferedStream. At the time I was targeting .NET 3.x on a 32-bit platform. Today, targeting .NET 4.5 on a 64-bit platform, I do not see any improvement.

相关

我碰到一个情况下,流大,生成CSV文件从一个ASP.Net MVC的行动响应流是非常缓慢的。在这种情况下加入由100倍一个BufferedStream改进的性能。欲了解更多请参见无缓冲输出非常慢

I came across a case where streaming a large, generated CSV file to the Response stream from an ASP.Net MVC action was very slow. Adding a BufferedStream improved performance by 100x in this instance. For more see Unbuffered Output Very Slow

这篇关于读大的文本文件在C#中的流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆