如何快速解码huffman代码? [英] How to decode huffman code quickly?

查看:404
本文介绍了如何快速解码huffman代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经实施了一个简单的我不太了解如何快速解码压缩文件,我的坏算法是:



枚举所有的huffman代码在代码表中,然后将其与压缩文件中的位进行比较。结果是可怕的结果:解压缩3MB文件将需要6小时。



更新:
我已经实现了
解码器与状态表,基于我的朋友林的建议。我认为这个方法应该比六年级的truffal huffman树好,3MB。



谢谢。

解决方案

优化二叉树方法的一种方法是使用查找表。您可以安排表格,以便您可以直接查找特定的编码位模式,允许任何代码的最大可能位宽。



t使用完全最大宽度,它们包括在表中的多个位置 - 未使用位的每个组合的一个位置。该表指示要从输入中丢弃多少位以及解码输出。



如果最长的代码太长,那么表是不切实际的,折衷是以使用较小的固定宽度下标查找的树。例如,您可以使用256项目表来处理一个字节。如果输入代码大于8位,则表条目指示解码不完整,并引导您到处理下一个最多8位的表。更大的表贸易内存的速度 - 256项目可能太小。



我相信这种一般的方法被称为前缀表,这是BobMcGees引用的代码是做的。可能的区别是,一些压缩算法需要在解压缩期间更新前缀表 - 这对于简单的霍夫曼来说不是必需的。 IIRC,我第一次看到一本关于bitmapped图形文件格式的书,包括GIF,在专利恐慌之前的一段时间。



应该很容易预先计算完整查找表,哈希表等效项或二叉树模型中的小树表。二叉树仍然是代码的关键表示 - 这个查找表只是优化。


I have implementated a simple compressor using pure huffman code under Windows.But I do not know much about how to decode the compressed file quickly,my bad algorithm is:

Enumerate all the huffman code in the code table then compare it with the bits in the compressed file.It turns out horrible result:decompressing 3MB file would need 6 hours.

Could you provide a much more efficient algorithm?Should I use Hash or something?

Update: I have implementated the decoder with state table,based on my friend Lin's advice.I think this method should be better than travesal huffman tree,3MB within 6s.

thanks.

解决方案

One way to optimise the binary-tree approach is to use a lookup table. You arrange the table so that you can look up a particular encoded bit-pattern directly, allowing for the maximum possible bit-width of any code.

Since most codes don't use the full maximum width, they are included at multiple locations in the table - one location for each combination of the unused bits. The table indicates how many bits to discard from the input as well as the decoded output.

If the longest code is too long, so the table is impractical, a compromise is to use a tree of smaller fixed-width-subscript lookups. For example, you can use a 256-item table to handle a byte. If the input code is more than 8 bits, the table entry indicates that decoding is incomplete and directs you to a table that handles the next up-to 8 bits. Larger tables trade memory for speed - 256 items is probably too small.

I believe this general approach is called "prefix tables", and is what BobMcGees quoted code is doing. A likely difference is that some compression algorithms require the prefix table to be updated during decompression - this is not needed for simple Huffman. IIRC, I first saw it in a book about bitmapped graphics file formats which included GIF, some time before the patent panic.

It should be easy to precalculate either a full lookup table, a hashtable equivalent, or a tree-of-small-tables from a binary tree model. The binary tree is still the key representation of the code - this lookup table is just optimisation.

这篇关于如何快速解码huffman代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆