压缩格式对档案中的随机访问有良好的支持？ [英] Compression formats with good support for random access within archives?

查看：137 发布时间：2016/12/25 11:59:18 compression gzip archive zlib random-access

本文介绍了压缩格式对档案中的随机访问有良好的支持？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这与上一个问题类似，但是其中的答案不能满足我的需求，我的问题略有不同：

This is similar to a previous question, but the answers there don't satisfy my needs and my question is slightly different:

我目前对包含排序数据的一些非常大的文件使用gzip压缩。当文件未被压缩时，二进制搜索是一种方便而有效的方式来支持在排序数据中寻找位置。

I currently use gzip compression for some very large files which contain sorted data. When the files are not compressed, binary search is a handy and efficient way to support seeking to a location in the sorted data.

但是当文件被压缩时，棘手。我最近发现了 zlib 的 Z_FULL_FLUSH 选项，可以在压缩期间在压缩输出中插入同步点（ inflateSync（），然后可以从文件中的各个点开始读取）。这是好的，虽然我已经有文件必须重新压缩添加这个功能（奇怪的是 gzip 没有这个选项，但我愿意如果我必须写自己的压缩程序）。

But when the files are compressed, things get tricky. I recently found out about zlib's Z_FULL_FLUSH option, which can be used during compression to insert "sync points" in the compressed output (inflateSync() can then begin reading from various points in the file). This is OK, though files I already have would have to be recompressed to add this feature (and strangely gzip doesn't have an option for this, but I'm willing to write my own compression program if I must).

看起来从一个来源甚至 Z_FULL_FLUSH 不是一个完美的解决方案...不仅是它不是所有gzip档案都支持，但是在档案中检测同步点的想法可能会产生误报（通过与同步点的魔法数字重合，或者由于 Z_SYNC_FLUSH 也产生同步点，但它们不能用于随机访问）。

It seems from one source that even Z_FULL_FLUSH is not a perfect solution...not only is it not supported by all gzip archives, but the very idea of detecting sync points in archives may produce false positives (either by coincidence with the magic number for sync points, or due to the fact that Z_SYNC_FLUSH also produces sync points but they are not usable for random access).

有更好的解决方案吗？如果可能的话，我希望避免使用辅助文件进行索引，而对于准随机访问的显式的默认支持将是有帮助的（即使它是大粒度的 - 例如能够以每10 MB间隔开始读取）。是否有另一种压缩格式比gzip更好地支持随机读取？

Is there a better solution? I'd like to avoid having auxiliary files for indexing if possible, and explicit, default support for quasi-random access would be helpful (even if it's large-grained--like being able to start reading at each 10 MB interval). Is there another compression format with better support for random reads than gzip?

编辑：如我所说，我想做二叉搜索压缩数据。我不需要寻找到一个特定的（未压缩）位置 - 只是在压缩文件中寻找一些粗糙的粒度。我只需要支持像解压缩大约50％（25％，12.5％等）进入这个压缩文件的方式。

Edit: As I mentioned, I wish to do binary search in the compressed data. I don't need to seek to a specific (uncompressed) position--only to seek with some coarse granularity within the compressed file. I just want support for something like "Decompress the data starting roughly 50% (25%, 12.5%, etc.) of the way into this compressed file."

压缩格式对档案中的随机访问有良好的支持？ [英] Compression formats with good support for random access within archives?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

压缩格式对档案中的随机访问有良好的支持？ [英] Compression formats with good support for random access within archives?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭