Java:有关BufferedInputStream的available()方法的问题 [英] Java: Issue with available() method of BufferedInputStream
问题描述
我正在处理以下代码,该代码用于将一个大文件拆分为一组较小的文件:
I'm dealing with the following code that is used to split a large file into a set of smaller files:
FileInputStream input = new FileInputStream(this.fileToSplit);
BufferedInputStream iBuff = new BufferedInputStream(input);
int i = 0;
FileOutputStream output = new FileOutputStream(fileArr[i]);
BufferedOutputStream oBuff = new BufferedOutputStream(output);
int buffSize = 8192;
byte[] buffer = new byte[buffSize];
while (true) {
if (iBuff.available() < buffSize) {
byte[] newBuff = new byte[iBuff.available()];
iBuff.read(newBuff);
oBuff.write(newBuff);
oBuff.flush();
oBuff.close();
break;
}
int r = iBuff.read(buffer);
if (fileArr[i].length() >= this.partSize) {
oBuff.flush();
oBuff.close();
++i;
output = new FileOutputStream(fileArr[i]);
oBuff = new BufferedOutputStream(output);
}
oBuff.write(buffer);
}
} catch (Exception e) {
e.printStackTrace();
}
这是我看到的怪异行为……当我使用3GB文件运行此代码时,初始iBuff.available()调用返回的值约为2,100,000,000,并且代码运行正常.当我在12GB的文件上运行此代码时,最初的iBuff.available()调用仅返回200,000,000的值(该值小于拆分文件的大小500,000,000并导致处理出错).
This is the weird behavior I'm seeing... when I run this code using a 3GB file, the initial iBuff.available() call returns a value of a approximatley 2,100,000,000 and the code works fine. When I run this code on a 12GB file, the initial iBuff.available() call only returns a value of 200,000,000 (which is smaller than the split file size of 500,000,000 and causes the processing to go awry).
我认为行为上的差异与32位Windows上的事实有关.我将在4.5 GB和3.5 GB的文件上运行多个测试.如果3.5版文件有效而4.5版文件无效,则将进一步确认该理论是32位与64位的问题,因为4 GB将成为阈值.
I'm thinking this discrepancy in behvaior has something to do with the fact that this is on 32-bit windows. I'm going to run a couple more tests on a 4.5 GB file and a 3.5 GB file. If the 3.5 file works and the 4.5 one doesn't, that will further confirm the theory that it's a 32bit vs 64bit issue since 4GB would then be the threshold.
推荐答案
如果您阅读了Javadoc,就会清楚地指出:
Well if you read the javadoc it quite clearly states:
返回可以 从此输入流中读取 不受阻碍(我加了强调)
Returns the number of bytes that can be read from this input stream without blocking (emphasis added by me)
因此,很明显,您想要的不是此方法提供的.因此,根据基础InputStream,您可能会更早遇到问题(例如,网络中的流与未返回文件大小的服务器的流—您必须读取完整的文件并对其进行缓冲,才能返回可用的正确"文件()计数,这将花费大量时间-如果您只想读取标头该怎么办?)
So it's quite clear that what you want is not what this method offers. So depending on the underlying InputStream you may get problems much earlier (eg a stream over the network with a server that doesn't return the filesize - you'd have to read the complete file and buffer it just to return the "correct" available() count, which would take a lot of time - what if you only want to read a header?)
因此,解决此问题的正确方法是更改解析方法,以便能够分批处理文件.就我个人而言,我什至没有什至在这里使用available()的理由-仅调用read()并在read()返回-1时立即停止就可以了.如果要确保每个文件确实包含blockSize字节,可能会变得更加复杂-如果这种情况很重要,只需添加一个内部循环.
So the correct way to handle this is to change your parsing method to be able to handle the file in pieces. Personally I don't see much reason at all to even use available() here - just calling read() and stopping as soon as read() returns -1 should work fine. Can be made more complicated if you want to assure that every file really contains blockSize byte - just add an internal loop if that scenario is important.
int blockSize = XXX;
byte[] buffer = new byte[blockSize];
int i = 0;
int read = in.read(buffer);
while(read != -1) {
out[i++].write(buffer, 0, read);
read = in.read(buffer);
}
这篇关于Java:有关BufferedInputStream的available()方法的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!