合并大文件而不将整个文件加载到内存中? [英] Merge huge files without loading whole file into memory?

查看:155
本文介绍了合并大文件而不将整个文件加载到内存中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将包含字符串的大文件合并到一个文件中,并尝试使用nio2。我不想将整个文件加载到内存中,所以我尝试使用BufferedReader:

I want to merge huge files containing strings into one file and tried to use nio2. I do not want to load the whole file into memory, so I tried it with BufferedReader:

public void mergeFiles(filesToBeMerged) throws IOException{

Path mergedFile = Paths.get("mergedFile");
Files.createFile(mergedFile);

List<Path> _filesToBeMerged = filesToBeMerged;

try (BufferedWriter writer = Files.newBufferedWriter(mergedFile,StandardOpenOption.APPEND)) {
        for (Path file : _filesToBeMerged) {
// this does not work as write()-method does not accept a BufferedReader
            writer.append(Files.newBufferedReader(file));
        }
    } catch (IOException e) {
        System.err.println(e);
    }

}

我试过这个,这个works,hower,字符串的格式(例如新行等不会复制到合并文件中):

I tried it with this, this works, hower, the format of the strings (e.g. new lines, etc is not copied to the merged file):

...
try (BufferedWriter writer = Files.newBufferedWriter(mergedFile,StandardOpenOption.APPEND)) {
        for (Path file : _filesToBeMerged) {
//              writer.write(Files.newBufferedReader(file));
            String line = null;


BufferedReader reader = Files.newBufferedReader(file);
            while ((line = reader.readLine()) != null) {
                    writer.append(line);
                    writer.append(System.lineSeparator());
             }
reader.close();
        }
    } catch (IOException e) {
        System.err.println(e);
    }
...

如何在没有NIO2的情况下合并大型文件将整个文件加载到内存中?

How can I merge huge Files with NIO2 without loading the whole file into memory?

推荐答案

如果你想有效地合并两个或多个文件,你应该问问自己,为什么你在使用 char 基于 Reader Writer 来执行该任务。

If you want to merge two or more files efficiently you should ask yourself, why on earth are you using char based Reader and Writer to perform that task.

通过使用这些类,您可以将文件的字节转换为字符,从系统的默认编码转换为unicode,然后从unicode转换回系统的默认编码。这意味着程序必须对整个文件执行两次数据转换。

By using these classes you are performing a conversion of the file’s bytes to characters from the system’s default encoding to unicode and back from unicode to the system’s default encoding. This means the program has to perform two data conversion on the entire files.

顺便说一下, BufferedReader BufferedWriter 绝不是 NIO2 工件。这些类从Java的第一个版本开始就存在。

And, by the way, BufferedReader and BufferedWriter are by no means NIO2 artifacts. These classes exists since the very first version of Java.

当您通过真正的NIO函数使用逐字节复制时,文件可以在不被Java触及的情况下传输应用程序,在最好的情况下,传输将直接在文件系统的缓冲区中执行:

When you are using byte-wise copying via real NIO functions, the files can be transferred without being touched by the Java application, in the best case the transfer will be performed directly in the file system’s buffer:

import static java.nio.file.StandardOpenOption.*;

import java.io.IOException;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.Paths;

public class MergeFiles
{
  public static void main(String[] arg) throws IOException {
    if(arg.length<2) {
      System.err.println("Syntax: infiles... outfile");
      System.exit(1);
    }
    Path outFile=Paths.get(arg[arg.length-1]);
    System.out.println("TO "+outFile);
    try(FileChannel out=FileChannel.open(outFile, CREATE, WRITE)) {
      for(int ix=0, n=arg.length-1; ix<n; ix++) {
        Path inFile=Paths.get(arg[ix]);
        System.out.println(inFile+"...");
        try(FileChannel in=FileChannel.open(inFile, READ)) {
          for(long p=0, l=in.size(); p<l; )
            p+=in.transferTo(p, l-p, out);
        }
      }
    }
    System.out.println("DONE.");
  }
}

这篇关于合并大文件而不将整个文件加载到内存中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆