如何使用多个线程进行zlib压缩(相同的输入源) [英] How to use multiple threads for zlib compression (same input source)

查看:296
本文介绍了如何使用多个线程进行zlib压缩(相同的输入源)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是在并行线程中压缩相同源的数据。
我已经定义了列表中的作业,这些作业有读取信息(每个作业中为500kb-1MB)。

My goal is to compress the data of the same source in parallel threads. I have defined jobs which are in a list, these jobs have the read information(500kb-1MB in each job).

我的压缩程序线程会压缩每个

My compressor threads will compress each block's data using ZLIB and store it in the outbuf of the corresponding jobs.

现在,我想合并所有这些并创建一个标准ZLIB格式的输出文件。

Now, I want to ,merge all this and create one output file which is of standard ZLIB format.

从ZLIB RFC和浏览pigzee的源码后,我明白

From the ZLIB RFC and after browsing the source of pigzee, I understand that

ZLIB标头是如下所示

A ZLIB header is like below

     +---+---+
     |CMF|FLG| (2 bytes)
     +---+---+
     +---+---+---+---+
     |     DICTID    | (4 bytes. Present only when FLG.FDICT is set)
     +---+---+---+---+
     +=====================+
     |...compressed data...| (variable size of data)
     +=====================+
     +---+---+---+---+
     |     ADLER32   |  (4 bytes of variable data)
     +---+---+---+---+

在我的例子中,没有字典。

In my case, there is no dictionary as well.

所以当我合并两个压缩单元时, 。

So when I am combining two compressed units, the header of all the units is same.

因此,我正在做以下操作。

Hence, I am doing the following operaions.


  1. 对于第一个单元,我正在写头+压缩数据。

  1. For the first unit, I am writing the header + compressed data.

对于第二个单元到最后一个单元,

For the second unit to the last unit, I am writing only the compressed data (No header and no trailer)

所有单位完成后,我使用 adlrer32_combine()将所有作业输出数据的校验和转换为一个最终adler 32,然后在底部更新输出文件。

After all the units are done, I am using adlrer32_combine()and converting the checksum's of all the jobs output data into one final adler 32 and then I am updating the output file with it at the bottom.

但是问题是,在deflate期间,我得到一个错误,说数据在某些地方无效。

But the problem is that, I get an error during deflate saying the data is invalid at some places.

有人试过这样的东西吗?任何相关信息都会有帮助。

Have someone already tried something like this? Any relevant information will be really helpful.

推荐答案

您不能简单地连接原始泄漏数据流。每个放气流都是自行终止的,因此解压缩将在第一个流的末尾结束。

You cannot simply concatenate raw deflate data streams. Each deflate stream is self-terminating, and so decompression would end at the end of the first stream.

您需要仔细查看pigz代码以了解如何合并放气流。您可以使用 Z_SYNC_FLUSH 来完成最后一个块,并将其带到字节边界,而不会结束deflate流。然后,您可以完成放气流,并剥离标记为结束块的最后一个空块。除了应该正常终止的最后一个放气流。然后,您可以将 n-1 个未终止的放气流序列和最后一个终止放气流序列连接起来。

You need to look more carefully at the pigz code for how to merge deflate streams. You can use Z_SYNC_FLUSH to complete the last block and bring it to a byte boundary without ending the deflate stream. Then you can complete the deflate stream, and strip off the final empty block marked as the end block. Except for the last deflate stream which should terminate normally. Then you can concatenate the series of n-1 unterminated deflate streams and the last 1 terminating deflate stream.

这篇关于如何使用多个线程进行zlib压缩(相同的输入源)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆