如何检测LZW解码的码字长度 [英] How to detect codeword length for LZW Decoding

查看:98
本文介绍了如何检测LZW解码的码字长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个通用的LZW解码器c ++程序,但在查找有关所用代码字的长度(以位为单位)的文档时遇到了麻烦.我发现有些文章说代码字长12位,而另一些文章说16位长,而另一些文章则说使用可变位长.那是什么呢?对我来说有意义的是,位长是可变的,因为这将提供最佳的压缩效果(即最初从9位开始,然后在必要时移至10位,然后移至11位,依此类推).但是我找不到有关行业标准的任何官方"文档.

I'm writing a general LZW decoder c++ program and I'm having trouble finding documentation on the length (in bits) of codewords used. Some articles I've found say that codewords are 12bits long, while others say 16bits, while still others say that variable bit length is used. So which is it? It would make sense to me that bit length is variable since that would give the best compression (i.e. initially start with 9 bits, then move to 10 when necessary, then move to 11 etc...). But I can't find any "official" documentation on what the industry standard is.

例如,如果我要打开Microsoft Paint并创建一个简单的100x100pixel全黑图像并将其另存为Tiff.使用LZW压缩将图像保存在Tiff中.因此,在这种情况下,当我解析LZW码字时,应该为第一个码字读9位,12位或16位吗?以及我怎么知道该使用哪个?

For example, if I were to open up Microsoft Paint and create a simple 100x100pixel all black image and save it as a Tiff. The image is saved in the Tiff using LZW compression. So in this scenario when I'm parsing the LZW codewords, should I read in 9bits, 12bits, or 16bits for the first codeword? and how would I know which to use?

感谢您可以提供的任何帮助.

Thanks for any help you can provide.

推荐答案

LZW可以通过以下任何一种方法来完成.到目前为止(至少以我的经验),最常见的是从9位代码开始,然后当字典变满时,移至10位代码,依此类推直至达到最大大小.

LZW can be done any of these ways. By far the most common (at least in my experience) is start with 9 bit codes, then when the dictionary gets full, move to 10 bit codes, and so on up to some maximum size.

从那里,您通常有两个选择.一种是清除字典并重新开始.另一个是继续使用当前词典,而不添加新条目.在后一种情况下,通常可以跟踪压缩率,如果压缩率下降得太远,则可以清除字典并重新开始.

From there, you typically have a couple of choices. One is to clear the dictionary and start over. Another is to continue using the current dictionary, without adding new entries. In the latter case, you typically track the compression rate, and if it drops too far, then you clear the dictionary and start over.

我必须仔细研究文档以确保,但如果我没记错的话,TIFF中使用的LZW的特定实现从9开始,最高可达12位(在设计时,是MS-DOS是主要目标,并且12位代码的字典使用了可用的640K RAM中的大部分).如果有内存可用,它将在使用最后一个12位代码后立即清除该表.

I'd have to dig through docs to be sure, but if I'm not mistaken, the specific implementation of LZW used in TIFF starts at 9 and goes up to 12 bits (when it was being designed, MS-DOS was a major target, and the dictionary for 12-bit codes used most of the available 640K of RAM). If memory serves, it clears the table as soon as the last 12-bit code has been used.

这篇关于如何检测LZW解码的码字长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆