找到gzip开始和结束? [英] Find gzip start and end?
问题描述
从 RFC 1952 - GZIP :
每个GZIP文件只是一堆数据块(称为成员),每个文件包含一个。
每个成员以下列字节开头:
- 0x1F(ID1)
- 压缩方法。对于
DEFLATE
d文件, 0x08 。 0-7是保留值。 - 标志。 前三位保留,必须为零。
- (4字节)上次修改时间。可以设置为0.
- 额外的标志,由压缩方法定义。
- 操作系统,实际上是文件系统。 0 = FAT,3 = UNIX,11 = NTFS
$ b 成员的结尾不分隔。你必须实际上走整个成员。请注意,连接多个有效的GZIP文件会创建一个有效的GZIP文件。还要注意,超过成员可能仍然会导致成员读取成员(除非解压缩库是失败 - 急切 - 完全)。
I have some file, there's some random bytes, and multiple gzip files. How can i find start and end of gzip stream inside the some file? there's many random bytes between gzip streams. So, basically i need to find any gzip file and get it from there.
Reading from the RFC 1952 - GZIP :
Each GZIP file is just a bunch of data chunks (called members), one for each file contained.
Each member starts with the following bytes:
- 0x1F (ID1)
- 0x8B (ID2)
- compression method. 0x08 for a
DEFLATE
d file. 0-7 are reserved values. - flags. The top three bits are reserved and must be zero.
- (4 bytes) last modified time. May be set to 0.
- extra flags, defined by the compression method.
- operating system, actually the file system. 0=FAT, 3=UNIX, 11=NTFS
The end of a member is not delimited. You have to actually walk the entire member. Note that concatenating multiple valid GZIP files creates a valid GZIP file. Also note that overshooting a member may still result in a successful reading of the member (unless the decompressing library is fail-eagerly-and-completely).
这篇关于找到gzip开始和结束?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!