霍夫曼编码-标头& amp;紧急行动 [英] Huffman encoding - header & EOF

查看:46
本文介绍了霍夫曼编码-标头& amp;紧急行动的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在努力基于Java中基于霍夫曼算法的程序的实现,并且我正处于将编码后的内容输出到文件的阶段.我对如何实现解码所需的标头和eof感到有些困惑.目前,对于我的标头,我具有输入文件中出现的所有唯一值及其频率,但是在某些文章上,我看到人们用0或1表示节点,然后用频率表示(我有点困惑)因为它没有说符号是什么).

I am currently working on implementing a program based on the huffman algorithm in Java, and I am at the stage where I need to output the encoded content to a file. I am a bit confused about how to implement the header and eof needed for decoding. For my header at the moment I have all the unique values that occur from the input file and their frequency, but on some articles I have seen people do it with 0 or 1 represents the nodes and then the frequency (which I am a bit puzzled by as it doesn't say what the symbol is).

另外,据我所知,对于EOF,我将其像符号一样进行编码,以便对其进行读取和解码,但是我不确定我可以为它使用什么值肯定不会出现?我知道它的权重必须为1,但是不确定如何确保它不会真正出现在文件中.

Also, for the EOF as I understand it I encode it like the symbols so it gets read and decoded, however I am not sure what value I can use for it that will definitely not come up? I know it needs to a weight of 1 but was unsure how to make sure it won't actually be in the file.

推荐答案

对于赋值,我不得不这样做一次,这就是我们使用的方法.

I've had to do this once for an assignment and this is the approach we used.

通过使用0和1对树的结构(而不​​是频率)进行编码来对标头进行编码. "0"表示沿树移动,"1"表示我们在叶节点.这导致对树进行唯一编码的预遍历.

Encoding the header was done by using 0 and 1 to encode the structure of the tree (rather than the frequencies). A "0" denoted moving along the tree, a "1" denoted we were at a leaf node. This resulted in a sort of pre-order traversal of the tree encoding it uniquely.

例如,像((((ab)c)(de))的树将被编码为"0001 a 1 b 1 c 01 d 1 e",其中a, b,c,d,e是它们的ASCII形式.

For example, a tree like (((a b) c) (d e)) would be encoded as "0001a1b1c01d1e", where a,b,c,d,e are their ASCII forms.

这是图形形式的树:

     / \
   /\   /\
 /\  c d  e
a  b 

对于EOF,我们使用文件中的最后3位来指定需要读取的最后两个字节中的多少个.读取完最后一个字节后(因此我们处理了倒数第二个字节),我们检查了后3位:他们编码了要读取的位再减去6.所以110101xxxxxxx000表示读取110101(6位)"并丢弃其他所有东西". 1101011xxxxxx001的意思是读取1101011(7位)并丢弃其余部分",等等.

For the EOF we used the last 3 bits in the file to specify how many of the last two bytes needed to be read. Once we read the last byte (so we were working on the second last byte) we checked the last 3 bits: They encoded how many more bits to read, minus 6. So 110101xxxxxxx000 meant "read 110101 (6 bits) and discard everything else". 1101011xxxxxx001 meant "read 1101011 (7 bits) and discard the rest", etc.

以这种方式进行操作意味着我们不必具有表示EOF的特殊值,并且我们仍然可以一次读取一个字节的文件(尽管实际上我们需要在工作位置之前读取一个字节).

Doing it this way meant we didn't have to have a special value denoting the EOF and we could still read the file a byte at a time (although we actually needed to read one byte ahead of where we were working).

(对于EOF,我没有阅读您的文章,所以我不知道我们的想法是否适合您的方法.)

(For the EOF I haven't read your articles, so I don't know if our idea works with your approach.)

这篇关于霍夫曼编码-标头& amp;紧急行动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆