压缩唯一的数据流 [英] Compression for a unique stream of data

查看：102 发布时间：2016/12/25 13:38:36 compression zlib

本文介绍了压缩唯一的数据流的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有大量的整数数组。每个都有几千个整数，每个整数通常与前一个相同，或者只相差一个位或两个。我想把每个数组缩小到尽可能小，以减少我的磁盘IO。

I've got a large number of integer arrays. Each one has a few thousand integers in it, and each integer is generally the same as the one before it or is different by only a single bit or two. I'd like to shrink each array down as small as possible to reduce my disk IO.

Zlib将其缩小到原始大小的25％。这很好，但我不认为它的算法是特别适合的问题。有没有人知道一个压缩库或简单的算法，可能执行更好的这种类型的信息？

Zlib shrinks it to about 25% of its original size. That's nice, but I don't think its algorithm is particularly well suited for the problem. Does anyone know a compression library or simple algorithm that might perform better for this type of information?

更新：zlib将其转换为一个数组xor deltas收缩到约20％的原始尺寸。

Update: zlib after converting it to an array of xor deltas shrinks it to about 20% of the original size.

推荐答案

如果大多数整数真的与前一个相同，并且符号间的差异通常可以表示为

If most of the integers really are the same as the previous, and the inter-symbol difference can usually be expressed as a single bit flip, this sounds like a job for XOR.

获取输入流如下：

并输出：

一点伪代码

compressed[0] = uncompressed[0]
loop
  compressed[i] = uncompressed[i-1] ^ uncompressed[i]

我们现在已经将大部分输出减少为0，即使更改了高位。在您使用的任何其他工具中的RLE压缩将有一个字段的一天。它将在32位整数上工作更好，它仍然可以编码一个完全不同的整数在流中弹出。

We've now reduced most of the output to 0, even when a high bit is changed. The RLE compression in any other tool you use will have a field day with this. It'll work even better on 32-bit integers, and it can still encode a radically different integer popping up in the stream. You're saved the bother of dealing with bit-packing yourself, as everything remains an int-sized quantity.

当您想要解压缩时：

uncompressed[0] = compressed[0]
loop
  uncompressed[i] = uncompressed[i-1] ^ compressed[i]

这也有一个简单的算法，，因为它只是XOR。

This also has the advantage of being a simple algorithm that is going to run really, really fast, since it is just XOR.

这篇关于压缩唯一的数据流的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

压缩唯一的数据流 [英] Compression for a unique stream of data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

压缩唯一的数据流 [英] Compression for a unique stream of data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭