需要的帮助如何连接code。使用霍夫曼code字 [英] need help on how to encode words using huffman code

查看:122
本文介绍了需要的帮助如何连接code。使用霍夫曼code字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你怎么连接使用霍夫曼code比如极品code字

how do you encode words using the huffman code such as NEED

推荐答案

霍夫曼编码基本上采用可变长度的位串重新present令牌(一般用几个例外的字符)。更常见的一个令牌,较短它比特长度是,这是(通常)动态作为流被处理

Huffman encoding basically uses variable-length bit strings to represent tokens (generally characters with a couple of exceptions). The more common a token is, the shorter it's bit-length is and this is (usually) dynamic as the stream is processed.

通常有两种特殊的记号,ESCAPE和END-STREAM。

There are usually two special tokens, ESCAPE and END-STREAM.

编码维护词典这基本上是位序列的查找以获得令牌。最初,它仅包含两个特殊标记。

Encoding maintains a dictionary which is basically a lookup of the bit sequences to get a token. Initially it contains only the two special tokens.

有关逃生和END_STREAM的初始位序列可能是0和1(这是它并不真正的问题在一开始)。当EN codeR接收到一个字符不在字典中,它输出逃避,全长令牌,然后将其添加并指派新的比特序列的基础上,所有的标记频率。

The initial bit sequences for ESCAPE and END_STREAM could be 0 and 1 (which is which doesn't really matter at the start). When the encoder receives a character not in the dictionary, it outputs ESCAPE and the full length token, then it adds it and assigns new bit sequences, based on frequency of all the tokens.

所以,你的N可能会导致输出流:

So your 'N' may result in the output stream:

0 xxxxxxxx
| +- token code for N
+--- ESCAPE

和新词典:

ESCAPE:00
END-STREAM:01
N:1

那么你的'E',可能导致输出流:

Then your 'E' may result in the output stream:

0 xxxxxxxx 0 yyyyyyyy
             +- token code for E

和新词典:

ESCAPE:00
END-STREAM:01
N:10
E:11

您的下一封不会导致ESCAPE输出,因为它已经在词典中,所以你只输出了code(11)。它将改变字典,因为E现在有一个较高的数量。这并不重要在我们的情况,因为所有字符都是两个二进制数字,但是,在一个更​​复杂的例子中,E标记的位长度将缩短。

Your next E will not result in a ESCAPE output since it's already in the dictionary so you just output its code (11). It will change the dictionary since E now has a higher count. This won't matter in our situation since all characters are two binary digits but, in a more complicated example, the bit length of the 'E' token would shorten.

当对D到达时,输出流变为:

When the D arrives, the output stream becomes:

0 xxxxxxxx 0 yyyyyyyy 11 0 zzzzzzzz
                      |    +- token code for D
                      +------ second E

和新词典:

ESCAPE:00
END-STREAM:011
N:010
E:11
D:10

所以,你可以看到的是,随着越来越多的角色进来,共同那些位长度减少和的不常用的增加,给你你的COM pression。 N(或D),在这种情况下得到一个3位数code,使用E坚持了2位数的code,因为有更多的人。

So you can see that, as more characters come in, the bit length of the common ones reduce and that of the uncommon one increase, giving you your compression. N (or D) in this case gets a 3-digit code, while E sticks with a 2-digit code because there's more of them.

的妙处在于,你不需要存储字典文件,因为ESCAPE部分动态地建立它去COM pression为好。

The beauty is that you don't need to store the dictionary with the file since the ESCAPE sections build it dynamically for de-compression as well.

此外,因为从来没有一个END-STREAM令牌才结束,这是位长度还在不断变大。类似逃生,同时还是有很多的新特性进来,其位长度保持很短,但在没有新的字符到达时间,它遭受了同样的命运END-STREAM。

In addition, because there's NEVER an END-STREAM token until the end, it's bit length keeps getting bigger. Similar for ESCAPE, while there's still lots of new character coming in, its bit length stays short but, when no new characters are arriving, it suffers the same fate as END-STREAM.

有一个(8位ASCII)输入流最好的情况是含什么,但数以百万计的相同字符的文件。此收费9位的第一个字符,然后在1位对每个附加字符然后2位流的结束。那么快接近1换8 COM pression比例(97.5%)。

The best case for an (8-bit ASCII) input stream is a file containing nothing but millions of the same character. This costs 9 bits for the first character, then 1 bit for each additional character then 2 bits for the end of stream. That fast approaches a 1-for-8 compression ratio (97.5%).

在最坏的情况下,正是每个字符的费用,每个字符的9位以及流的末尾之一 - 这实际上是由12.5%扩大了流

The worst case is exactly one of each character which costs 9 bits per character plus the end of stream - this actually expands the stream by 12.5%.

这篇关于需要的帮助如何连接code。使用霍夫曼code字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆