只解压特定的 bzip2 块 [英] Only decompress a specific bzip2 block

查看:15
本文介绍了只解压特定的 bzip2 块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个 bzip2 文件(超过 5GB),我只想解压块 #x,因为我的数据在那里(块每次都不同).我该怎么做?

Say I have a bzip2 file (over 5GB), and I want to decompress only block #x, because there is where my data is (block is different every time). How would I do this?

我考虑过创建所有块所在位置的索引,然后从文件中剪切我需要的块并将 bzip2recover 应用到它.

I thought about making an index of where all the blocks are, then cut the block I need from the file and apply bzip2recover to it.

我还考虑过一次压缩 1MB,然后将其附加到一个文件中(并记录位置),并在需要时简单地抓取该文件,但我宁愿保持原始 bzip2 文件完整无缺.

I also thought about compressing say 1MB at a time, then appending this to a file (and recording the location), and simply grabbing the file when I need it, but I'd rather keep the original bzip2 file intact.

我的首选语言是 Ruby,但任何语言的解决方案我都可以(只要我理解原理).

My preferred language is Ruby, but any language's solution is fine by me (as long as I understand the principle).

推荐答案

有一个 http://bitbucket.org/james_taylor/seek-bzip2

获取源代码,编译它.

运行

./seek-bzip2  32 < bzip_compressed.bz2 

进行测试.

唯一的参数是奇怪的块头的位位移.您可以通过在二进制文件中查找31 41 59 26 53 59"十六进制字符串来获得它. 这是不正确的.块开始可能未与字节边界对齐,因此您应该搜索31 41 59 26 53 59"十六进制字符串的每个可能的位移,就像在 bzip2recover 中所做的一样 - http://www.bzip.org/1.0.3/html/recovering.html

the only param is bit displacement of wondered block header. You can get it with finding a "31 41 59 26 53 59 " hex string in the binary file. THIS WAS INCORRECT. Block start may be not aligned to byte boundary, so you should search for every possible bit shifts of "31 41 59 26 53 59" hex string, as it is done in bzip2recover - http://www.bzip.org/1.0.3/html/recovering.html

32 是BZh1"标头的位大小,其中 1 可以是从1"到9"的任何数字(在经典 bzip2 中)——它是一个(未压缩的)块大小,以数百 kb(不准确)为单位.

32 is bit size of "BZh1" header where 1 can be any digit from "1" to "9" (in classic bzip2) - it is a (uncompressed) block size in hundreds of kb (not exact).

这篇关于只解压特定的 bzip2 块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆