从s3读取时,GZIPInputStream过早关闭 [英] GZIPInputStream is prematurely closed when reading from s3
问题描述
new BufferedReader(new InputStreamReader(
new GZIPInputStream(s3Service.getObject(bucket, objectKey).getDataInputStream())))
如果文件大于那么,在~100行之后创建从 readLine()
返回null的Reader几MB。
在小于1 MB的gzip文件上不可重现。
有谁知道如何处理这个?
creates Reader that returns null from readLine()
after ~100 lines if file is greater then several MB.
Not reproducible on gzip files less then 1 MB.
Does anybody knows how to handle this?
推荐答案
来自 BufferedReader#readLine的文档()
:
返回:
Returns:
A包含该行内容的字符串,不包括任何行终止字符,如果已到达流末尾,则为null
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
我会很清楚地说明这意味着什么:遇到了文件/流的结尾 - 没有更多数据可用。
I would say it pretty clear what this means: The end of the file/stream has been encountered - no more data is available.
值得注意使用GZIP格式的怪癖:可以将多个文件相互追加,以创建具有多个gzip压缩对象的更大文件。似乎 GZIPInputStream
只读取其中的第一个。
Notable quirks with the GZIP format: Multiple files can just be appended to one-another to create a larger file with multiple gzipped objects. It seems that the GZIPInputStream
only reads the first of those.
这也解释了为什么它适用于小文件。那些只包含一个压缩对象,因此读取整个文件。
That also explains why it is working for "small files". Those contain only one zipped object, so the whole file is read.
注意:如果 GZIPInputStream
不正确地确定一个gzip文件结束了,你可以在同一个 InputStream
上打开另一个 GZIPInputStream
并读取多个对象。
Note: If the GZIPInputStream
determines undestructively that one gzip-file is over, you could just open another GZIPInputStream
on the same InputStream
and read multiple objects.
这篇关于从s3读取时,GZIPInputStream过早关闭的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!