合并大文件的最佳方法是什么? [英] What is the best way to merge large files?

查看:58
本文介绍了合并大文件的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须合并数千个大文件(每个文件约200MB).我想知道什么是合并此文件的最佳方法.行将有条件地复制到合并的文件中.可以通过使用File.AppendAllLines或使用Stream.CopyTo吗?

I have to merge thousands of large files (~200MB each). I would like to know what is the best way to merge this files. Lines will be conditionally copied to the merged file. Could it by using File.AppendAllLines or using Stream.CopyTo?

使用File.AppendAllLines

Using File.AppendAllLines

for (int i = 0; i < countryFiles.Length; i++){
   string srcFileName = countryFiles[i];
   string[] countryExtractLines = File.ReadAllLines(srcFileName);  
   File.AppendAllLines(actualMergedFileName, countryExtractLines);
}

使用Stream.CopyTo

Using Stream.CopyTo

using (Stream destStream = File.OpenWrite(actualMergedFileName)){
  foreach (string srcFileName in countryFiles){
    using (Stream srcStream = File.OpenRead(srcFileName)){
        srcStream.CopyTo(destStream);
    }
  }
}

推荐答案

假设您要添加到另一个文件的每个文件中的每一行的条件都必须为true(即谓词).

Suppose you have a condition which must be true (i.e. a predicate) for each line in one file that you want to append to another file.

您可以有效地进行以下处理:

You can efficiently process that as follows:

var filteredLines = 
    File.ReadLines("MySourceFileName")
    .Where(line => line.Contains("Target")); // Put your own condition here.

File.AppendAllLines("MyDestinationFileName", filteredLines);

这种方法可扩展到多个文件,并避免将整个文件加载到内存中.

This approach scales to multiple files and avoids loading the entire file into memory.

如果要替换内容而不是将所有行附加到文件中,请执行以下操作:

If instead of appending all the lines to a file, you wanted to replace the contents, you'd do:

File.WriteAllLines("MyDestinationFileName", filteredLines);

代替

File.AppendAllLines("MyDestinationFileName", filteredLines);

还请注意,如果您未使用UTF8,则这些方法的重载可让您指定编码.

Also note that there are overloads of these methods that allow you to specify the encoding, if you are not using UTF8.

最后,不要被不一致的方法命名抛出. File.ReadLines()不会将所有行读入内存,但是 File.ReadAllLines()可以.但是, File.WriteAllLines()不会将所有行缓冲到内存中,也不希望它们都被缓冲在内存中.它使用 IEnumerable< string> 作为输入.

Finally, don't be thrown by the inconsistent method naming.File.ReadLines() does not read all lines into memory, but File.ReadAllLines() does. However, File.WriteAllLines() does NOT buffer all lines into memory, or expect them to all be buffered in memory; it uses IEnumerable<string> for the input.

这篇关于合并大文件的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆