找到 gzip 开始和结束? [英] Find gzip start and end?

查看:31
本文介绍了找到 gzip 开始和结束?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些文件,有一些随机字节和多个 gzip 文件.如何在某个文件中找到 gzip 流的开始和结束?gzip 流之间有许多随机字节.所以,基本上我需要找到任何 gzip 文件并从那里获取它.

I have some file, there's some random bytes, and multiple gzip files. How can i find start and end of gzip stream inside the some file? there's many random bytes between gzip streams. So, basically i need to find any gzip file and get it from there.

推荐答案

阅读RFC 1952 - GZIP :

每个 GZIP 文件只是一堆数据块(称为成员),每个包含一个文件.

Each GZIP file is just a bunch of data chunks (called members), one for each file contained.

每个成员以下列字节开始:

Each member starts with the following bytes:

  • 0x1F (ID1)
  • 0x8B (ID2)
  • 压缩方法.0x08 用于 DEFLATEd 文件.0-7 是保留值.
  • 旗帜.前三位是保留的,必须为零.
  • (4 个字节)上次修改时间.可以设置为 0.
  • 由压缩方法定义的额外标志.
  • 操作系统,实际上是文件系统.0=FAT,3=UNIX,11=NTFS
  • 0x1F (ID1)
  • 0x8B (ID2)
  • compression method. 0x08 for a DEFLATEd file. 0-7 are reserved values.
  • flags. The top three bits are reserved and must be zero.
  • (4 bytes) last modified time. May be set to 0.
  • extra flags, defined by the compression method.
  • operating system, actually the file system. 0=FAT, 3=UNIX, 11=NTFS

成员的结尾没有分隔.你必须真正走遍整个成员.请注意,连接多个有效的 GZIP 文件会创建一个有效的 GZIP 文件.另请注意,超过成员可能仍会导致成功读取成员(除非解压缩库急切而完全失败).

The end of a member is not delimited. You have to actually walk the entire member. Note that concatenating multiple valid GZIP files creates a valid GZIP file. Also note that overshooting a member may still result in a successful reading of the member (unless the decompressing library is fail-eagerly-and-completely).

这篇关于找到 gzip 开始和结束?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆