将“Deflate”的数据字典重用于压缩数据 [英] Reusing a data dictionary for 'Deflate' separate from the compressed data

查看:188
本文介绍了将“Deflate”的数据字典重用于压缩数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在XML文件中存储了许多base64编码的64位双精度块。双数据看起来都很相似。

I am storing many chunks of base64 encoded 64-bit doubles in an XML file. The double data all looks similar.

在编码之前,双数据当前正在使用java'Deflate'算法进行压缩,但文件中的每个二进制数据块都将有自己的deflate数据字典,这是一个我想大大减少的开销。 'Deflater'类有一个'setDictionary'方法,我想用它。

The double data is currently being compressed using the java 'Deflate' algorithm before the encoding, however each chunk of binary data in the file will have its own deflate data dictionary, which is an overhead I would like to greatly lessen. The 'Deflater' class has a 'setDictionary' method which I would like to use.

所以问题是:

1)。有没有人有任何建议如何最好地建立我自己的单个自定义数据字典基于多个双打(x8字节),他可以用于多个deflate操作,即使用相同的字典进行所有压缩?我应该在所有字节数组中寻找公共字节,最常见的字节放在字典数组的末尾吗?

1). Does anyone have any suggestions for how to best build my own single custom data dictionary based on multiple sections of doubles (x8 bytes) that could he used for multiple deflate operations, i.e. use the same dictionary for all the compressions? Should I be looking for common bytes across all byte arrays, with the commonest byte put at the end of the dictionary array?

2)。我可以将(自定义)数据字典与收缩数据分开,然后在再次膨胀数据之前将字典设置为放气数据吗?

2). Can I separate the (custom) data dictionary from the deflated data, and then set the dictionary against the deflated data later before inflating the data again?

3)。 deflate算法是否会使用我的自定义数据字典,然后只是创建自己的略有不同的数据字典,既可以使我的单数据字典无效,又可以减少使用单数据字典节省的空间?

3). Will the deflate algorithm take my custom data dictionary, and then just create its own slightly different data dictionary anyway, both invalidating my singular data dictionary and lessening the potential space saving from using a singular data dictionary?

4)。有人可以详细说明zlib压缩数据的结构,以便我自己可以尝试将数据字典与压缩数据分开吗?

4). Can someone elaborate on the structure of zlib compressed data, so that I myself may try to separate the data dictionary from the compressed data?

我想只使用空间数据字典在我的文件中,并在我的文件中用于我的双数据的每个块,但不用双数据存储它。如果数据字典无法与放气数据分开/单独存储,那么构建自定义单数字典似乎没什么价值,因为每个压缩块无论如何都会有自己的字典。这是对的吗?

I want to only use space for the data dictionary once in my file, and use it for each block of my double data in my filebut not store it with the double data. If the data dictionary cannot be separated from the deflated data/stored separately, then it seems that there would be little value in building a custom singular dictionary as each compressed block would have its own dictionary anyway. Is this right?

推荐答案


  1. 你可以设置一个由字符串组成的固定字典在您的数据中常见且频繁的,或者您可以将最后的 n 块连接为字典。无论哪种方式,压缩和解压缩两端都需要相同的字典才能在任何给定的块上工作。

  1. You can either set a fixed dictionary that consists of strings that are common and frequent in your data, or you can use the last n chunks concatenated as a dictionary. Either way, both the compression and decompression ends need the same dictionary to work with on any given chunk.

字典不随数据一起发送。这就是重点。另一方需要知道用于解压缩的字典,使用#1中的方法。

The dictionary is not sent with the data. That's the whole point. The other side needs to know the dictionary that was used in order to decompress, using some approach like those in #1.

字典deflate使用没有结构。在任何时间点,您使用之前的32K未压缩数据作为字典,在该字典中搜索从32K之后的下一个字节开始的匹配字符串。设置字典允许压缩器为前32K数据启动。这就是它的全部内容。

The dictionary deflate uses has no structure. At any point in time, you are using the previous 32K of uncompressed data as the dictionary within which to search for matching strings starting at the next byte after that 32K. Setting the dictionary allows the compressor to get a head start for the first 32K of data. That's all there is to it.

压缩数据中的字典就像你解压缩时得到的那样。

The "dictionary" is in the compressed data simply as what you get when you decompress.

这篇关于将“Deflate”的数据字典重用于压缩数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆