替换gzip文件中的内容而无需将其解压缩 [英] Replace content in a gzip file without decompressing it

查看:68
本文介绍了替换gzip文件中的内容而无需将其解压缩的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从Web服务接收以gzip格式压缩的xml文件.xml的原始大小约为80Mb,压缩版本约为10Mb.这些xml文件存储在我们的缓存中.

I'm receiving from a webservice xml files compressed in gzip format. The original size of the xmls is about 80Mb, and the compressed version is about 10Mb. These xml files are stored in our cache.

xml根包含一个表示8位唯一ID的属性.因此,当我们从缓存提供响应时,我们需要使用从另一个Web服务接收到的另一个ID来更改此ID,然后再将xml返回给最终用户.

The xml root contains an attribute representing an 8-digit unique ID. So, when we serve a response from cache, we need to change this ID with another one received from another webservice before returning the xml to the end user.

该过程应为:

  1. 解压缩缓存xml.
  2. 将缓存的ID替换为从Web服务接收到的ID.
  3. 再次压缩文件,然后将其返回给客户端.

是否可以执行此替换而不解压缩整个文档并再次压缩它?是否有部分读写的内容?

Is it possible to perform this replacement without decompressing the whole document and compressing it again? Any kind of partial read and write?

推荐答案

否.您必须对其进行解压缩,至少要找到ID的编码位置和编码方式.然后,您可以a)非常聪明,并使用当前定义的代码表找出如何使用不同的ID但位数相同的流重新组装流,从本质上解决了一个难题(假设可以完全解决),或b)使用新ID重新压缩整个内容.

No. You'd have to decompress it, at least to find where and how the ID is coded. Then you can either a) be really smart and figure out how to reassemble a stream with a different ID but the same number of bits, using the currently defined code tables, essentially solving a puzzle (assuming that it can be solved at all), or b) recompress the whole thing with the new ID.

如果您控制起点的压缩,则可以为此特别准备流,方法是在ID之前切换为无压缩,在ID之后立即刷新块,存储的块,然后继续压缩.您可能会注意到在输出流中的哪个位置.然后,您可以稍后替换ID,该ID直接以那些字节的形式出现在流中.您还需要更新CRC,为此您可以用旧的未压缩数据和新的未压缩数据的原始" CRC来异或"原始CRC.那只是旧ID和ID的异或,在其前后都带有一串零以填充数据长度.原始" CRC是这样的一种,其中CRC寄存器初始化为零,并且没有最终的异或.

If you are in control of the compression of your starting point, you could specially prepare the stream for this by switching to no compression right before the ID, flushing the block, a stored block, right after the ID, and then continuing with compression. You could note where in the output stream that is. Then you could later replace the ID, which appears in the stream as those bytes directly. You would also need to update the CRC, for which you can exclusive-or the original CRC with a "raw" CRC of the exclusive-or of the old uncompressed data and the new uncompressed data. That is just the exclusive-or of the old ID and the ID, with a bunch of zeros preceding and following that to fill out the data length. A "raw" CRC is one in which the CRC register is initialized with zeros and for which is there no final exclusive-or.

这篇关于替换gzip文件中的内容而无需将其解压缩的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆