在 7z 单个文件存档中随机查找 [英] random seek in 7z single file archive

查看:29
本文介绍了在 7z 单个文件存档中随机查找的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以对由 7zip 压缩的非常大的文件进行随机访问(大量搜索)?

Is it possible to do random access (a lot of seeks) to very huge file, compressed by 7zip?

原始文件非常大(999gb xml),我无法以解压格式存储它(我没有太多可用空间).因此,如果 7z 格式允许访问中间块,而无需在选定块之前解压缩所有块,我就可以构建块开头的索引和相应的原始文件偏移量.

The original file is very huge (999gb xml) and I can't store it in unpacked format (i have no so much free space). So, if 7z format allows accessing to middle block without uncompressing all blocks before selected one, I can built an index of block beginning and corresponding original file offsets.

我的 7z 档案的标题是

Header of my 7z archive is

37 7A BC AF 27 1C 00 02 28 99 F1 9D 4A 46 D7 EA  // 7z archive version 2;crc; n.hfr offset
00 00 00 00 44 00 00 00 00 00 00 00 F4 56 CF 92  // n.hdr offset; n.hdr size=44. crc
00 1E 1B 48 A6 5B 0A 5A 5D DF 57 D8 58 1E E1 5F
71 BB C0 2D BD BF 5A 7C A2 B1 C7 AA B8 D0 F5 26
FD 09 33 6C 05 1E DF 71 C6 C5 BD C0 04 3A B6 29

更新:7z 归档器说这个文件有一个数据块,用 LZMA 算法压缩.测试解压速度为600MB/s(解包数据),仅使用一个CPU内核.

UPDATE: 7z archiver says that this file has a single block of data, compressed with LZMA algorithm. Decompression speed on testing is 600 MB/s (of unpacked data), only one CPU core is used.

推荐答案

这在技术上是可行的,但如果您的问题是当前可用的二进制 7zip 命令行工具是否允许这样做",则答案是否定的.它允许的最好方法是将每个文件独立压缩到存档中,允许直接检索文件.但是由于您要压缩的是单个(巨大)文件,因此此技巧不起作用.

It's technically possible, but if your question is "does the currently available binary 7zip command line tool allows that', the answer is unfortunately no. The best it allows is to compress independantly each file into the archive, allowing the files to be retrieved directly. But since what you want to compress is a single (huge) file, this trick will not work.

恐怕唯一的方法是将您的文件分成小块,然后将它们提供给 LZMA 编码器(包含在 LZMA SDK 中).不幸的是,这需要一些编程技能.

I'm afraid the only way is to chunk your file into small blocks, and to feed them to an LZMA encoder (included in LZMA SDK). Unfortunately that requires some programming skills.

注意:可以在此处找到技术上较差但微不足道的压缩算法.主程序正是您所需要的:将源文件切成小块,并将它们一个一个地提供给压缩器(在本例中为 LZ4).然后解码器执行相反的操作.它可以轻松跳过所有压缩块并直接转到您要检索的块.http://code.google.com/p/lz4/source/browse/trunk/lz4demo.c

Note : a technically inferior but trivial compression algorithm can be found here. The main program does just what you are looking for : cut the source file into small blocks, and feed them one by one to a compressor (in this case, LZ4). The decoder then does the reverse operation. It can easily skip all the compressed blocks and go straight to the one you want to retrieve. http://code.google.com/p/lz4/source/browse/trunk/lz4demo.c

这篇关于在 7z 单个文件存档中随机查找的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆