霍夫曼编码 - 头&放大器; EOF [英] Huffman encoding - header & EOF

查看:199
本文介绍了霍夫曼编码 - 头&放大器; EOF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在实施基础上,霍夫曼算法的Java程序,而我在哪里,我需要输出EN codeD的内容到一个文件的阶段。我有点困惑如何实现的头和EOF解码所需。对于我此刻的头我都从输入文件和它们的频率发生的独特的价值观,但在一些文章中,我看到有人这样做有0或1重presents的节点,然后将频率(这是我感到有点困惑的,因为它没有说的符号是什么)。

I am currently working on implementing a program based on the huffman algorithm in Java, and I am at the stage where I need to output the encoded content to a file. I am a bit confused about how to implement the header and eof needed for decoding. For my header at the moment I have all the unique values that occur from the input file and their frequency, but on some articles I have seen people do it with 0 or 1 represents the nodes and then the frequency (which I am a bit puzzled by as it doesn't say what the symbol is).

另外,对于EOF据我了解我的连接code像的符号,因此它被读取,而德codeD,但是我不知道什么样的价值,我可以使用它,这将肯定不会来向上?我知道这需要权重为1,但不确定如何确保它不会实际上是在文件中。

Also, for the EOF as I understand it I encode it like the symbols so it gets read and decoded, however I am not sure what value I can use for it that will definitely not come up? I know it needs to a weight of 1 but was unsure how to make sure it won't actually be in the file.

推荐答案

我已经为一项任务做一次,这是我们使用的方法。

I've had to do this once for an assignment and this is the approach we used.

编码的头段,通过使用0和1至连接code中的树的结构(而不​​是频率)进行。 0表示沿树移动,1表示我们在一个叶子结点。这导致了一种pre-顺序编码树遍历它独特的。

Encoding the header was done by using 0 and 1 to encode the structure of the tree (rather than the frequencies). A "0" denoted moving along the tree, a "1" denoted we were at a leaf node. This resulted in a sort of pre-order traversal of the tree encoding it uniquely.

例如,像一棵树(((AB)C)(DE))将连接codeD为0001 A 1 B 1 C 01 D 1 电子,其中A,B,C,D,E是他们的ASCII形式。

For example, a tree like (((a b) c) (d e)) would be encoded as "0001a1b1c01d1e", where a,b,c,d,e are their ASCII forms.

下面是在一个图形形式的树:

Here's the tree in a graphical form:

     / \
   /\   /\
 /\  c d  e
a  b 

有关在EOF我们使用的最后3个比特的文件中指定多少所需的最后两个字节被读取。当我们读到最后一个字节(所以我们工作的第二个最后一个字节)我们检查后3位:它们连接codeD多少位来阅读,零下6。所以 110101xxxxxxx000 的意思是读 110101 (6位),并放弃一切。 1101011xxxxxx001 的意思是读 1101011 (7位),并丢弃剩下的,等等。

For the EOF we used the last 3 bits in the file to specify how many of the last two bytes needed to be read. Once we read the last byte (so we were working on the second last byte) we checked the last 3 bits: They encoded how many more bits to read, minus 6. So 110101xxxxxxx000 meant "read 110101 (6 bits) and discard everything else". 1101011xxxxxx001 meant "read 1101011 (7 bits) and discard the rest", etc.

这样做,这样意味着我们没有必须有一个特殊的值表示的EOF,我们仍然可以同时读取文件一个字节(尽管我们真的需要预读,我们正在努力的一个字节)。

Doing it this way meant we didn't have to have a special value denoting the EOF and we could still read the file a byte at a time (although we actually needed to read one byte ahead of where we were working).

(对于EOF我没有看过你的文章,所以我不知道我们的想法与您的方法。)

(For the EOF I haven't read your articles, so I don't know if our idea works with your approach.)

这篇关于霍夫曼编码 - 头&放大器; EOF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆