java中的RandomAccessFile是否读取内存中的整个文件? [英] Does RandomAccessFile in java read entire file in memory?
问题描述
我需要从大文件中读取最后n行(比如说2GB)。该文件是UTF-8编码的。
I need to read last n lines from a large file (say 2GB). The file is UTF-8 encoded.
想知道最有效的方法。在java中读取RandomAccessFile,但是seek()方法读取内存中的整个文件。它使用原生实现,所以我无法引用源代码。
Would like to know the most efficient way of doing it. Read about RandomAccessFile in java, but does the seek() method , read the entire file in memory. It uses native implementation so i wasn't able to refer the source code.
推荐答案
-
RandomAccessFile.seek只设置文件指针当前位置,没有字节被读入内存。
RandomAccessFile.seek just sets the file-pointer current position, no bytes are read into memory.
由于你的文件是UTF-8编码的,它是一个文本文件。对于读取文本文件,我们通常使用BufferedReader,Java 7甚至添加了一个方便的方法File.newBufferedReader来创建BufferedReader的实例来从文件中读取文本。虽然读取最后n行可能效率低,但易于实现。
Since your file is UTF-8 encoded, it is a text file. For reading text files we typically use BufferedReader, Java 7 even added a convinience method File.newBufferedReader to create an instance of a BufferedReader to read text from a file. Though it may be inefficient for reading last n lines, but easy to implement.
为了提高效率,我们需要RandomAccessFile并从最后开始向后读取文件。这是一个基本的例子
To be efficient we need RandomAccessFile and read file backwards starting from the end. Here is a basic example
public static void main(String[] args) throws Exception {
int n = 3;
List<String> lines = new ArrayList<>();
try (RandomAccessFile f = new RandomAccessFile("test", "r")) {
ByteArrayOutputStream bout = new ByteArrayOutputStream();
for (long length = f.length(), p = length - 1; p > 0 && lines.size() < n; p--) {
f.seek(p);
int b = f.read();
if (b == 10) {
if (p < length - 1) {
lines.add(0, getLine(bout));
bout.reset();
}
} else if (b != 13) {
bout.write(b);
}
}
}
System.out.println(lines);
}
static String getLine(ByteArrayOutputStream bout) {
byte[] a = bout.toByteArray();
// reverse bytes
for (int i = 0, j = a.length - 1; j > i; i++, j--) {
byte tmp = a[j];
a[j] = a[i];
a[i] = tmp;
}
return new String(a);
}
它读取从尾到ByteArrayOutputStream的字节后的文件字节,当LF是到达它后会反转字节并创建一条线。
It reads the file byte after byte starting from tail to ByteArrayOutputStream, when LF is reached it reverses the bytes and creates a line.
需要改进两件事:
-
缓冲
buffering
EOL识别
这篇关于java中的RandomAccessFile是否读取内存中的整个文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!