gz可能部分减压吗? [英] is partial gz decompression possible?

查看:83
本文介绍了gz可能部分减压吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

要处理存储为.gz文件的图像(我的图像处理软件可以读取.gz文件以缩短/减小磁盘时间/空间),我需要检查每个文件的标题.

标头只是每个图像开头固定大小的小结构,对于未压缩的图像,检查它非常快.为了读取压缩的图像,我别无选择,只能解压缩整个文件,然后检查此标头,这当然会减慢我的程序的速度.

是否可以读取.gz文件的第一段(例如,几个K),解压缩该段并读取原始内容?我对gz的理解是,在一开始进行一些簿记之后,压缩数据将按顺序存储-是正确的吗?

所以而不是
1.打开大文件F
2.解压缩大文件F
3.读取500字节的标头
4.重新压缩大文件F


1.打开大文件F
2.从F作为流A
读取前5 K 3.将A解压缩为流B
4.从B

读取500字节的标头

我正在使用libz.so,但其他语言的解决方案也受到欢迎!

例如,您可以使用gzip -cd file.gz | dd ibs=1024 count=10解压缩仅前10个KiB.

gzip -cd解压缩为标准输出.

|插入dd实用程序.

dd实用程序将标准输入复制到标准输出. 因此,dd ibs=1024将输入块大小设置为1024字节,而不是默认的512.

count=10仅复制10个输入块,从而停止了gzip解压缩.

您将要使用标准的512块大小执行gzip -cd file.gz | dd count=1,而忽略额外的12个字节.

一条注释突出显示您可以使用gzip -cd file.gz | head -c $((1024*10))或在这种特定情况下使用gzip -cd file.gz | head -c $(512).原始dd依赖于1024中的gzip解压缩的注释似乎并不正确.例如,dd ibs=2 count=10解压缩前20个字节.

For working with images that are stored as .gz files (my image processing software can read .gz files for shorter/smaller disk time/space) I need to check the header of each file.

The header is just a small struct of a fixed size at the start of each image, and for images that are not compressed, checking it is very fast. For reading the compressed images, I have no choice but to decompress the whole file and then check this header, which of course slows down my program.

Would it be possible to read the first segment of a .gz file (say a couple of K), decompress this segment and read the original contents? My understanding of gz is that after some bookkeeping at the start, the compressed data is stored sequentially -- is that correct?

so instead of
1. open big file F
2. decompress big file F
3. read 500-byte header
4. re-compress big file F

do
1. open big file F
2. read first 5 K from F as stream A
3. decompress A as stream B
4. read 500-byte header from B

I am using libz.so but solutions in other languages are appreciated!

解决方案

You can use gzip -cd file.gz | dd ibs=1024 count=10 to uncompress just the first 10 KiB, for example.

gzip -cd decompresses to the standard output.

Pipe | this into the dd utility.

The dd utility copies the standard input to the standard output. Sodd ibs=1024 sets the input block size to 1024 bytes instead of the default 512.

And count=10 Copies only 10 input blocks, thus halting the gzip decompression.

You'll want to do gzip -cd file.gz | dd count=1 using the standard 512 block size and just ignore the extra 12 bytes.

A comment highlights that you can use gzip -cd file.gz | head -c $((1024*10)) or in this specific case gzip -cd file.gz | head -c $(512). The comment that the original dd relies on gzip decompressing in 1024 doesn't seem to true. For example dd ibs=2 count=10 decompresses the first 20 bytes.

这篇关于gz可能部分减压吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆