找到gzip开始和结束? [英] Find gzip start and end?

查看:187
本文介绍了找到gzip开始和结束?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些文件,有一些随机字节和多个gzip文件。我怎样才能找到一些文件内的gzip流的开始和结束? gzip流之间有许多随机字节。所以,基本上我需要找到任何gzip文件,并从那里得到它。

RFC 1952 - GZIP

每个GZIP文件只是一堆数据块(称为成员),每个文件包含一个。



每个成员以下列字节开头:


  • 0x1F(ID1)

  • 压缩方法。对于 DEFLATE d文件, 0x08 。 0-7是保留值。

  • 标志。 前三位保留,必须为零

  • (4字节)上次修改时间。可以设置为0.

  • 额外的标志,由压缩方法定义。

  • 操作系统,实际上是文件系统。 0 = FAT,3 = UNIX,11 = NTFS

$ b 成员的结尾不分隔。你必须实际上走整个成员。请注意,连接多个有效的GZIP文件会创建一个有效的GZIP文件。还要注意,超过成员可能仍然会导致成员读取成员(除非解压缩库是失败 - 急切 - 完全)。


I have some file, there's some random bytes, and multiple gzip files. How can i find start and end of gzip stream inside the some file? there's many random bytes between gzip streams. So, basically i need to find any gzip file and get it from there.

解决方案

Reading from the RFC 1952 - GZIP :

Each GZIP file is just a bunch of data chunks (called members), one for each file contained.

Each member starts with the following bytes:

  • 0x1F (ID1)
  • 0x8B (ID2)
  • compression method. 0x08 for a DEFLATEd file. 0-7 are reserved values.
  • flags. The top three bits are reserved and must be zero.
  • (4 bytes) last modified time. May be set to 0.
  • extra flags, defined by the compression method.
  • operating system, actually the file system. 0=FAT, 3=UNIX, 11=NTFS

The end of a member is not delimited. You have to actually walk the entire member. Note that concatenating multiple valid GZIP files creates a valid GZIP file. Also note that overshooting a member may still result in a successful reading of the member (unless the decompressing library is fail-eagerly-and-completely).

这篇关于找到gzip开始和结束?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆