从s3读取时,GZIPInputStream过早关闭 [英] GZIPInputStream is prematurely closed when reading from s3

查看:176
本文介绍了从s3读取时,GZIPInputStream过早关闭的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

new BufferedReader(new InputStreamReader(
       new GZIPInputStream(s3Service.getObject(bucket, objectKey).getDataInputStream())))

如果文件大于那么,在~100行之后创建从 readLine()返回null的Reader几MB。
在小于1 MB的gzip文件上不可重现。
有谁知道如何处理这个?

creates Reader that returns null from readLine() after ~100 lines if file is greater then several MB. Not reproducible on gzip files less then 1 MB. Does anybody knows how to handle this?

推荐答案

来自 BufferedReader#readLine的文档()


返回:

Returns:

A包含该行内容的字符串,不包括任何行终止字符,如果已到达流末尾,则为null

A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached

我会很清楚地说明这意味着什么:遇到了文件/流的结尾 - 没有更多数据可用。

I would say it pretty clear what this means: The end of the file/stream has been encountered - no more data is available.

值得注意使用GZIP格式的怪癖:可以将多个文件相互追加,以创建具有多个gzip压缩对象的更大文件。似乎 GZIPInputStream 只读取其中的第一个。

Notable quirks with the GZIP format: Multiple files can just be appended to one-another to create a larger file with multiple gzipped objects. It seems that the GZIPInputStream only reads the first of those.

这也解释了为什么它适用于小文件。那些只包含一个压缩对象,因此读取整个文件。

That also explains why it is working for "small files". Those contain only one zipped object, so the whole file is read.

注意:如果 GZIPInputStream 不正确地确定一个gzip文件结束了,你可以在同一个 InputStream 上打开另一个 GZIPInputStream 并读取多个对象。

Note: If the GZIPInputStream determines undestructively that one gzip-file is over, you could just open another GZIPInputStream on the same InputStream and read multiple objects.

这篇关于从s3读取时,GZIPInputStream过早关闭的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆