Java读取大文本文件具有7000万行文本 [英] Java Read Large Text File With 70million line of text

查看：1397 发布时间：2018/11/27 11:11:22 java io

本文介绍了Java读取大文本文件具有7000万行文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含7000万行文本的大型测试文件。
我必须逐行阅读文件。

I have a big test file with 70 million lines of text. I have to read the file line by line.

我使用了两种不同的方法：

I used two different approaches:

InputStreamReader isr = new InputStreamReader(new FileInputStream(FilePath),"unicode");
BufferedReader br = new BufferedReader(isr);
while((cur=br.readLine()) != null);

和

LineIterator it = FileUtils.lineIterator(new File(FilePath), "unicode");
while(it.hasNext()) cur=it.nextLine();

还有其他方法可以让这项任务更快吗？

Is there another approach which can make this task faster?

最好的问候，

推荐答案

1）我确信速度没有区别，都在内部使用FileInputStream和缓冲

1) I am sure there is no difference speedwise, both use FileInputStream internally and buffering

2）你可以进行测量并自己看看

2) You can take measurements and see for yourself

3）虽然没有性能优点我喜欢1.7方法

3) Though there's no performance benifits I like 1.7 approach

try (BufferedReader br = Files.newBufferedReader(Paths.get("test.txt"), StandardCharsets.UTF_8)) {
    for (String line = null; (line = br.readLine()) != null;) {
        //
    }
}

4）基于扫描仪的版本

4) Scanner based version

    try (Scanner sc = new Scanner(new File("test.txt"), "UTF-8")) {
        while (sc.hasNextLine()) {
            String line = sc.nextLine();
        }
        // note that Scanner suppresses exceptions
        if (sc.ioException() != null) {
            throw sc.ioException();
        }
    }

5）这可能比其他更快

5) This may be faster than the rest

try (SeekableByteChannel ch = Files.newByteChannel(Paths.get("test.txt"))) {
    ByteBuffer bb = ByteBuffer.allocateDirect(1000);
    for(;;) {
        StringBuilder line = new StringBuilder();
        int n = ch.read(bb);
        // add chars to line
        // ...
    }
}

它需要一些编码，但由于ByteBuffer.allocateDirect，它可以非常快。它允许操作系统直接从文件读取字节到ByteBuffer，无需复制

it requires a bit of coding but it can be really faster because of ByteBuffer.allocateDirect. It allows OS to read bytes from file to ByteBuffer directly, without copying

6）并行处理肯定会提高速度。创建一个大字节缓冲区，运行几个任务，将文件中的字节读取到该缓冲区并行，当准备好找到第一行结束时，创建一个字符串，找到下一个...

6) Parallel processing would definitely increase speed. Make a big byte buffer, run several tasks that read bytes from file into that buffer in parallel, when ready find first end of line, make a String, find next...

这篇关于Java读取大文本文件具有7000万行文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Java读取大文本文件具有7000万行文本 [英] Java Read Large Text File With 70million line of text

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Java读取大文本文件具有7000万行文本 [英] Java Read Large Text File With 70million line of text

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭