java中的RandomAccessFile是否读取内存中的整个文件? [英] Does RandomAccessFile in java read entire file in memory?

查看:194
本文介绍了java中的RandomAccessFile是否读取内存中的整个文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从大文件中读取最后n行(比如说2GB)。该文件是UTF-8编码的。

I need to read last n lines from a large file (say 2GB). The file is UTF-8 encoded.

想知道最有效的方法。在java中读取RandomAccessFile,但是seek()方法读取内存中的整个文件。它使用原生实现,所以我无法引用源代码。

Would like to know the most efficient way of doing it. Read about RandomAccessFile in java, but does the seek() method , read the entire file in memory. It uses native implementation so i wasn't able to refer the source code.

推荐答案


  1. RandomAccessFile.seek只设置文件指针当前位置,没有字节被读入内存。

  1. RandomAccessFile.seek just sets the file-pointer current position, no bytes are read into memory.

由于你的文件是UTF-8编码的,它是一个文本文件。对于读取文本文件,我们通常使用BufferedReader,Java 7甚至添加了一个方便的方法File.newBufferedReader来创建BufferedReader的实例来从文件中读取文本。虽然读取最后n行可能效率低,但易于实现。

Since your file is UTF-8 encoded, it is a text file. For reading text files we typically use BufferedReader, Java 7 even added a convinience method File.newBufferedReader to create an instance of a BufferedReader to read text from a file. Though it may be inefficient for reading last n lines, but easy to implement.

为了提高效率,我们需要RandomAccessFile并从最后开始向后读取文件。这是一个基本的例子

To be efficient we need RandomAccessFile and read file backwards starting from the end. Here is a basic example

public static void main(String[] args) throws Exception {
    int n = 3;
    List<String> lines = new ArrayList<>();
    try (RandomAccessFile f = new RandomAccessFile("test", "r")) {
        ByteArrayOutputStream bout = new ByteArrayOutputStream();
        for (long length = f.length(), p = length - 1; p > 0 && lines.size() < n; p--) {
            f.seek(p);
            int b = f.read();
            if (b == 10) {
                if (p < length - 1) {
                    lines.add(0, getLine(bout));
                    bout.reset();
                }
            } else if (b != 13) {
                bout.write(b);
            }
        }
    }
    System.out.println(lines);
}

static String getLine(ByteArrayOutputStream bout) {
    byte[] a = bout.toByteArray();
    // reverse bytes
    for (int i = 0, j = a.length - 1; j > i; i++, j--) {
        byte tmp = a[j];
        a[j] = a[i];
        a[i] = tmp;
    }
    return new String(a);
}

它读取从尾到ByteArrayOutputStream的字节后的文件字节,当LF是到达它后会反转字节并创建一条线。

It reads the file byte after byte starting from tail to ByteArrayOutputStream, when LF is reached it reverses the bytes and creates a line.

需要改进两件事:


  1. 缓冲

  1. buffering

EOL识别

这篇关于java中的RandomAccessFile是否读取内存中的整个文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆