找到 gzip 开始和结束? [英] Find gzip start and end?
问题描述
我有一些文件,有一些随机字节和多个 gzip 文件.如何在某个文件中找到 gzip 流的开始和结束?gzip 流之间有许多随机字节.所以,基本上我需要找到任何 gzip 文件并从那里获取它.
I have some file, there's some random bytes, and multiple gzip files. How can i find start and end of gzip stream inside the some file? there's many random bytes between gzip streams. So, basically i need to find any gzip file and get it from there.
推荐答案
阅读RFC 1952 - GZIP :
每个 GZIP 文件只是一堆数据块(称为成员),每个包含一个文件.
Each GZIP file is just a bunch of data chunks (called members), one for each file contained.
每个成员以下列字节开始:
Each member starts with the following bytes:
- 0x1F (ID1)
- 0x8B (ID2)
- 压缩方法.0x08 用于
DEFLATE
d 文件.0-7 是保留值. - 旗帜.前三位是保留的,必须为零.
- (4 个字节)上次修改时间.可以设置为 0.
- 由压缩方法定义的额外标志.
- 操作系统,实际上是文件系统.0=FAT,3=UNIX,11=NTFS
- 0x1F (ID1)
- 0x8B (ID2)
- compression method. 0x08 for a
DEFLATE
d file. 0-7 are reserved values. - flags. The top three bits are reserved and must be zero.
- (4 bytes) last modified time. May be set to 0.
- extra flags, defined by the compression method.
- operating system, actually the file system. 0=FAT, 3=UNIX, 11=NTFS
成员的结尾没有分隔.你必须真正走遍整个成员.请注意,连接多个有效的 GZIP 文件会创建一个有效的 GZIP 文件.另请注意,超过成员可能仍会导致成功读取成员(除非解压缩库急切而完全失败).
The end of a member is not delimited. You have to actually walk the entire member. Note that concatenating multiple valid GZIP files creates a valid GZIP file. Also note that overshooting a member may still result in a successful reading of the member (unless the decompressing library is fail-eagerly-and-completely).
这篇关于找到 gzip 开始和结束?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!