Java:从具有缓冲输入的随机访问文件中读取字符串 [英] Java: reading strings from a random access file with buffered input

查看:114
本文介绍了Java:从具有缓冲输入的随机访问文件中读取字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以前从未接触过Java IO API的经验,现在我真的很沮丧。我发现很难相信它有多奇怪和复杂,做一个简单的任务有多难。

I've never had close experiences with Java IO API before and I'm really frustrated now. I find it hard to believe how strange and complex it is and how hard it could be to do a simple task.

我的任务:我有2个位置(起始字节,结束字节), pos1 pos2 。我需要读取这两个字节之间的行(包括起始字节,不包括结尾字节)并将它们用作UTF8字符串对象。

My task: I have 2 positions (starting byte, ending byte), pos1 and pos2. I need to read lines between these two bytes (including the starting one, not including the ending one) and use them as UTF8 String objects.

例如,在大多数脚本中语言它将是一个非常简单的1-2-3-liner(在Ruby中,但它对于Python,Perl等基本相同):

For example, in most script languages it would be a very simple 1-2-3-liner like that (in Ruby, but it will be essentially the same for Python, Perl, etc):

f = File.open("file.txt").seek(pos1)
while f.pos < pos2 {
  s = f.readline
  # do something with "s" here
}

Java IO API很快就会出现问题;)实际上,我看到了两种从常规本地文件读取行(以 \ n 结尾)的方法:

It quickly comes hell with Java IO APIs ;) In fact, I see two ways to read lines (ending with \n) from regular local files:


  • RandomAccessFile 具有 getFilePointer() seek(long pos),但它是 readLine()读取非UTF8字符串(甚至不是字节数组),但编码损坏的字符串非常奇怪,并且没有缓冲(这可能意味着每个读取*()调用将被转换为单个不正常操作系统 read() =>相当慢。)

  • BufferedReader 有很好的 readLine()方法,它甚至可以用 skip(long n)进行搜索,但它无法确定已读取的偶数字节数,也没有提及文件中的当前位置。

  • RandomAccessFile has getFilePointer() and seek(long pos), but it's readLine() reads non-UTF8 strings (and even not byte arrays), but very strange strings with broken encoding, and it has no buffering (which probably means that every read*() call would be translated into single undelying OS read() => fairly slow).
  • BufferedReader has great readLine() method, and it can even do some seeking with skip(long n), but it has no way to determine even number of bytes that has been already read, not mentioning the current position in a file.

我是试图使用类似的东西:

I've tried to use something like:

    FileInputStream fis = new FileInputStream(fileName);
    FileChannel fc = fis.getChannel();
    BufferedReader br = new BufferedReader(
            new InputStreamReader(
                    fis,
                    CHARSET_UTF8
            )
    );

...然后使用 fc.position()获取当前文件读取位置和 fc.position(newPosition)设置一个,但它似乎在我的情况下不起作用:看起来它返回位置由BufferedReader完成的缓冲区预填充,或类似的东西 - 这些计数器似乎以16K为增量进行四舍五入。

... and then using fc.position() to get current file reading position and fc.position(newPosition) to set one, but it doesn't seem to work in my case: looks like it returns position of a buffer pre-filling done by BufferedReader, or something like that - these counters seem to be rounded up in 16K increments.

我是否真的需要通过我自己,即文件阅读器界面,它将:

Do I really have to implement it all by myself, i.e. a file readering interface which would:


  • 允许我在文件中获取/设置位置

  • 缓冲区文件读取操作

  • 允许读取UTF8字符串(或者至少允许读取所有内容直到下一个 \ n )

  • allow me to get/set position in a file
  • buffer file reading operations
  • allow reading UTF8 strings (or at least allow operations like "read everything till the next \n")

有没有比自己实施更快的方法?我在监督什么吗?

Is there a quicker way than implementing it all myself? Am I overseeing something?

推荐答案

import org.apache.commons.io.input.BoundedInputStream

FileInputStream file = new FileInputStream(filename);
file.skip(pos1);
BufferedReader br = new BufferedReader(
   new InputStreamReader(new BoundedInputStream(file,pos2-pos1))
);

如果你不关心 pos2 ,那么你不需要Apache Commons IO。

If you didn't care about pos2, then you woundn't need Apache Commons IO.

这篇关于Java:从具有缓冲输入的随机访问文件中读取字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆