逐行读取文本文件的最快方法是什么? [英] What's the fastest way to read a text file line-by-line?

查看:26
本文介绍了逐行读取文本文件的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想逐行读取文本文件.我想知道我是否在 .NET C# 范围内尽可能高效地执行此操作.

I want to read a text file line by line. I wanted to know if I'm doing it as efficiently as possible within the .NET C# scope of things.

这是我目前正在尝试的:

This is what I'm trying so far:

var filestream = new System.IO.FileStream(textFilePath,
                                          System.IO.FileMode.Open,
                                          System.IO.FileAccess.Read,
                                          System.IO.FileShare.ReadWrite);
var file = new System.IO.StreamReader(filestream, System.Text.Encoding.UTF8, true, 128);

while ((lineOfText = file.ReadLine()) != null)
{
    //Do something with the lineOfText
}

推荐答案

要找到逐行读取文件的最快方法,您必须进行一些基准测试.我在我的电脑上做了一些小测试,但你不能指望我的结果适用于你的环境.

To find the fastest way to read a file line by line you will have to do some benchmarking. I have done some small tests on my computer but you cannot expect that my results apply to your environment.

使用 StreamReader.ReadLine

这基本上是你的方法.出于某种原因,您将缓冲区大小设置为可能的最小值 (128).增加这通常会提高性能.默认大小为 1,024,其他不错的选择是 512(Windows 中的扇区大小)或 4,096(NTFS 中的簇大小).您必须运行基准测试以确定最佳缓冲区大小.更大的缓冲区 - 如果不是更快 - 至少不会比较小的缓冲区慢.

This is basically your method. For some reason you set the buffer size to the smallest possible value (128). Increasing this will in general increase performance. The default size is 1,024 and other good choices are 512 (the sector size in Windows) or 4,096 (the cluster size in NTFS). You will have to run a benchmark to determine an optimal buffer size. A bigger buffer is - if not faster - at least not slower than a smaller buffer.

const Int32 BufferSize = 128;
using (var fileStream = File.OpenRead(fileName))
  using (var streamReader = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize)) {
    String line;
    while ((line = streamReader.ReadLine()) != null)
      // Process line
  }

FileStream 构造函数允许您指定 文件选项.例如,如果您从头到尾顺序读取一个大文件,您可能会受益于 FileOptions.SequentialScan.同样,基准测试是您能做的最好的事情.

The FileStream constructor allows you to specify FileOptions. For example, if you are reading a large file sequentially from beginning to end, you may benefit from FileOptions.SequentialScan. Again, benchmarking is the best thing you can do.

使用 File.ReadLines

这与您自己的解决方案非常相似,只是它是使用 StreamReader 实现的,缓冲区大小为 1,024.在我的计算机上,与缓冲区大小为 128 的代码相比,这会导致性能稍好一些.但是,您可以通过使用更大的缓冲区大小来获得相同的性能提升.此方法使用迭代器块实现,不会为所有行消耗内存.

This is very much like your own solution except that it is implemented using a StreamReader with a fixed buffer size of 1,024. On my computer this results in slightly better performance compared to your code with the buffer size of 128. However, you can get the same performance increase by using a larger buffer size. This method is implemented using an iterator block and does not consume memory for all lines.

var lines = File.ReadLines(fileName);
foreach (var line in lines)
  // Process line

使用 File.ReadAllLines

这与前面的方法非常相似,只是此方法会增长用于创建返回的行数组的字符串列表,因此内存要求更高.但是,它返回 String[] 而不是 IEnumerable 允许您随机访问行.

This is very much like the previous method except that this method grows a list of strings used to create the returned array of lines so the memory requirements are higher. However, it returns String[] and not an IEnumerable<String> allowing you to randomly access the lines.

var lines = File.ReadAllLines(fileName);
for (var i = 0; i < lines.Length; i += 1) {
  var line = lines[i];
  // Process line
}

使用 String.Split

此方法相当慢,至少在大文件上(在 511 KB 文件上测试),可能是由于 String.Split 的实现方式.它还为所有行分配一个数组,与您的解决方案相比,增加了所需的内存.

This method is considerably slower, at least on big files (tested on a 511 KB file), probably due to how String.Split is implemented. It also allocates an array for all the lines increasing the memory required compared to your solution.

using (var streamReader = File.OpenText(fileName)) {
  var lines = streamReader.ReadToEnd().Split("
".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
  foreach (var line in lines)
    // Process line
}

我的建议是使用 File.ReadLines 因为它干净高效.如果您需要特殊的共享选项(例如您使用 FileShare.ReadWrite),您可以使用您自己的代码,但您应该增加缓冲区大小.

My suggestion is to use File.ReadLines because it is clean and efficient. If you require special sharing options (for example you use FileShare.ReadWrite), you can use your own code but you should increase the buffer size.

这篇关于逐行读取文本文件的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆