压缩排序的整数 [英] Compress sorted integers

查看:277
本文介绍了压缩排序的整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在建立一个索引,它只是连续存储在二进制文件中的几组有序32位整数。问题是,这个文件增长相当大。我一直在想添加一些压缩计划,但这有点失去了我的专业知识。所以我想知道,什么压缩算法在这种情况下最好的工作?此外,解压缩必须快速,因为这个索引将用于进行查找。

I'm building a index which is just several sets of ordered 32 bit integers stored continuously in a binary file. The problem is that this file grows pretty large. I've been thinking of adding some compressions scheme but that's a bit out of my expertise. So I'm wondering, what compression algorithm would work best in this case? Also, decompression has to be fast since this index will be used to make make look ups.

推荐答案

如果你存储整数(例如:1,3,4,5,9,10等...),而不是一些随机的32位整数(982346 ...,3487623412 ..,etc),你可以做一件事:

If you are storing integers which are close together (eg: 1, 3 ,4, 5, 9, 10 etc... ) rather than some random 32 bit integers (982346..., 3487623412.., etc) you can do one thing:

查找相邻数字之间的差异,例如2,1,1,4,1 ...(在我们的示例中)和然后霍夫曼编码这些数字。

Find the differences between the adjacent numbers which would be like 2,1,1,4,1... etc.(in our example) and then Huffman encode this numbers.

我不认为霍夫曼编码将工作,如果你直接应用他们原来的数字列表。

I don't think Huffman encoding will work if you directly apply them to the original list of numbers you have.

但是如果你有一个接近的数字的排序列表,几率是好的,你会得到一个非常好的压缩比通过做数字差异的霍夫曼编码,可能比使用在Zip库中使用的LZW算法更好的比率。

But if you have a sorted list of near-by numbers, the odds are good that you will get a very good compression ratio by doing Huffman encoding of the number differences, may be better ratio than using the LZW algorithm used in the Zip libraries.

无论如何,感谢张贴这个有趣的问题。

Anyway thanks for posting this interesting question.

这篇关于压缩排序的整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆