多部分gzip文件随机访问（在Java中） [英] Multi-part gzip file random access (in Java)

查看：199 发布时间：2016/12/25 13:08:01 compression gzip multipart random-access

本文介绍了多部分gzip文件随机访问（在Java中）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这可能属于不真正可行或不真正值得努力的范围，但这里可以。

This may fall in the realm of "not really feasible" or "not really worth the effort" but here goes.

我试图随机访问记录存储在多部分gzip文件中。具体来说，我感兴趣的文件是压缩的 Heretrix Arc文件。（如果您不熟悉多部分gzip文件，gzip规范允许多个gzip流连接在一个gzip文件中，它们不共享任何字典信息，这是简单的二进制附加。）

I'm trying to randomly access records stored inside a multi-part gzip file. Specifically, the files I'm interested in are compressed Heretrix Arc files. (In case you aren't familiar with multi-part gzip files, the gzip spec allows multiple gzip streams to be concatenated in a single gzip file. They do not share any dictionary information, it is simple binary appending.)

我认为应该可以通过查找文件中的某个偏移量，然后扫描gzip魔术头字节（即0x1f8b，根据 RFC ），并尝试从以下字节读取gzip流。这种方法的问题是，这些相同的字节也可能出现在实际数据内部，因此寻找这些字节可能导致无效的位置开始读取gzip流。是否有更好的方法来处理随机访问，因为记录偏移不是先验已知的？

I'm thinking it should be possible to do this by seeking to a certain offset within the file, then scan for the gzip magic header bytes (i.e. 0x1f8b, as per the RFC), and attempt to read the gzip stream from the following bytes. The problem with this approach is that those same bytes can appear inside the actual data as well, so seeking for those bytes can lead to an invalid position to start reading a gzip stream from. Is there a better way to handle random access, given that the record offsets aren't known a priori?

多部分gzip文件随机访问（在Java中） [英] Multi-part gzip file random access (in Java)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

多部分gzip文件随机访问（在Java中） [英] Multi-part gzip file random access (in Java)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭