如何在不使用Java进行任何缓冲的情况下读取文件? [英] How do I read a file without any buffering in Java?
问题描述
我正在解决Programming Pearls,第二版,第1列中的问题.其中一个问题涉及编写一个程序,该程序仅使用大约1 MB的内存来将文件的内容存储为位数组,每一位表示文件中是否存在7位数字.由于Java是我最熟悉的语言,因此即使作者似乎已经想到了C和C ++,我还是决定使用它.
I'm working through the problems in Programming Pearls, 2nd edition, Column 1. One of the problems involves writing a program that uses only around 1 megabyte of memory to store the contents of a file as a bit array with each bit representing whether or not a 7 digit number is present in the file. Since Java is the language I'm the most familiar with, I've decided to use it even though the author seems to have had C and C++ in mind.
由于我正在假装由于我正在处理的问题而受到限制,所以我想确保读取文件的过程根本没有缓冲.
Since I'm pretending memory is limited for the purpose of the problem I'm working on, I'd like to make sure the process of reading the file has no buffering at all.
我认为InputStreamReader
是一个很好的解决方案,直到我在
I thought InputStreamReader
would be a good solution, until I read this in the Java documentation:
为了能够将字节有效地转换为字符,与满足当前读取操作所需的字节数相比,可以从基础流中提前读取更多的字节.
To enable the efficient conversion of bytes to characters, more bytes may be read ahead from the underlying stream than are necessary to satisfy the current read operation.
理想情况下,只会从流中读取所需的字节-换句话说,我不需要任何缓冲.
Ideally, only the bytes that are necessary would be read from the stream -- in other words, I don't want any buffering.
推荐答案
问题之一涉及编写仅使用大约1兆字节内存的程序来将文件内容存储为位数组,每个位代表文件中是否存在7位数字.
One of the problems involves writing a program that uses only around 1 megabyte of memory to store the contents of a file as a bit array with each bit representing whether or not a 7 digit number is present in the file.
这意味着您需要以字节(不是字符)的形式读取文件.
This implies that you need to read the file as bytes (not characters).
假设您确实有真正的要求,无需缓冲即可读取文件,则应使用FileInputStream
类.它没有缓冲.它会精确地读取(或尝试读取)您要求的字节数.
Assuming that you do have a genuine requirement to read from a file without buffering, then you should use the FileInputStream
class. It does no buffering. It reads (or attempts to read) precisely the number of bytes that you asked for.
如果您随后需要将这些字节转换为字符,则可以通过将适当的String
构造函数应用于byte
或byte[]
来实现.请注意,对于诸如UTF-8
的多字节字符编码,您将需要读取足够的字节以完成每个字符.做到没有预读的可能性有点棘手……并且需要您正在阅读的字符编码的知识*".
If you then need to convert those bytes to characters, you could do this by applying the appropriate String
constructor to a byte
or byte[]
. Note that for multibyte character encodings such as UTF-8
, you would need to read sufficient bytes to complete each character. Doing that without the possibility of read-ahead is a bit tricky ... and entails "knowledge* of the character encoding you are reading.
(您可以直接使用CharsetDecoder
来避免该知识.但是随后您需要使用对Buffer
对象进行操作的decode
方法,这也有些复杂.)
(You could avoid that knowledge by using a CharsetDecoder
directly. But then you'd need to use the decode
method that operates on Buffer
objects, and that is a bit complicated too.)
对于它的价值,Java在字节流和字符流I/O之间进行了清晰区分.前者由InputStream
和OutputStream
支持,而后者由Reader
和Write
支持. InputStreamReader
类是Reader
,而 adapts 是InputStream
.您不应该考虑将其用于想要按字节读取内容的应用程序.
For what it is worth, Java makes a clear distinction between stream-of-byte and stream-of-character I/O. The former is supported by InputStream
and OutputStream
, and the latter by Reader
and Write
. The InputStreamReader
class is a Reader
, that adapts an InputStream
. You should not be considering using it for an application that wants to read stuff byte-wise.
这篇关于如何在不使用Java进行任何缓冲的情况下读取文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!