关于霍夫曼算法的问题 [英] a question about huffman algorithm

查看:126
本文介绍了关于霍夫曼算法的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨 我对霍夫曼算法有疑问

我们读取文件并将其转换为字符串,然后构建霍夫曼树和代码,然后将字符串更改为编码字符串,例如:

输入字符串:12345
1个代码:00
2码:111
3码:10
4码:110
5码:01
编码的字符串:001111011001

然后将此字符串写入文件

那么该算法如何减小文件的大小?

我做到了,使用此算法后,我的文件为4 KB,它更改为17 KB:|

怎么了我该怎么办?

请帮助我

解决方案

在输入字符串中,通常,每个符号使用八位(1字节),因此字符串"12345"为5个字节(40个位)长.
每个符号使用可变数量的编码输出字符串(例如,在您的示例中,符号"1''仅使用两个编码(即00),并且整个编码的输出字符串只有11位长(因此压缩系数为40:11).


First
尝试用霍夫曼编码时,别再考虑字符串思考了.为什么?:而且'\ 0'是必须编码的东西.因此,最好考虑一下您需要编码的字节流.

第二
对于非常短的信息,霍夫曼编码可以产生比信息更大的结果.原因:字典需要一些空间,并且如果您尝试编码8个字节,则字典需要比原始信息更多的字节的机会非常高.

第三名
用nodepad制作一个包含一个字符的文件.在资源管理器中检查大小.根据您的hardisk格式,它在Windows资源管理器中显示为1KB,在空间上的空间"为4096字节,在空间"为1字节.

结论
仔细检查信息.

顺便说一下,解决方案1 ​​是理论上正确的解释.

并祝贺您尝试自己解决问题,而不是仅仅使用库就解决了!

问候.


听起来好像您正在将字符串"001111011001"作为字符串写入文件.

字符串是字符序列.根据您对字符进行编码的方式,每个字符在文件中将占用8或16位.如果将单个字符"2"转换为3个字符"111"的序列,则文件的大小就增加了三倍.

您应该创建的是一系列字符,而不是一个字符序列.

不知道如何用Java做到这一点,但是在C语言中您可以这样做:

 字符 * encoded_string = "  0011111000110000111100001110001100";

 int  len = strlen(encoded_string);

字符 byte_value =  0 ;
 for ( int  ix =  0 ; ix < len; ix + =  8 ){
  字节值=字节值<<  1 ;
  如果(encoded_string [ix] == '  1')
     byte_value = byte_value | =  1 ;

  如果((ix% 8 )==  7 ){
     fprintf(outputfile," ,byte_value);
     byte_value =  0 ;
  }
}

 int 剩余数= ix% 8 ;
如果(其余){
  字节值=字节值<<余;
  fprintf(outputfile," ,byte_value);
} 



可能有一个Java库函数可以为您完成这种转换,但是我以这种格式将其提供给您,因此您可以看到实际需要发生的事情,并且可以理解其中的区别.

此代码示例假定您要首先使用最高有效位进行编码.通过一次输出一个字节来解决字节排序问题.

代码使用零位填充文件的末尾-这可能是霍夫曼编码方案的问题.在填充零位之前,可能应该在文件代码的末尾附加一个内容.


hi i have an question about huffman algorithm

we read a file and convert it to a string then we build the huffman tree and codes,then change the string to encoded string like:

input string : 12345
1 code:00
2 code:111
3 code:10
4 code:110
5 code:01
encoded string : 001111011001

and we write this string to a file

so how this algorithm decrease the size of file ?

i did this and my file was 4 KB after using this algorithm it change to 17 KB:|

whats wrong what should i do?

please help me

解决方案

In the input string you, tipically, use eight bit (1 byte) for each symbol, hence the string "12345" is 5 bytes (40 bits) long.
The output string is coded using a variable number of bits for each symbol (for instance, in your example, the symbol ''1'' is coded using just two bits (namely 00) and the whole encoded ouput string is only 11 bits long (hence you have a compression factor of 40:11).


First
Forget about thinking in strings while try to coding something with Huffman. Why?: Also ‘\0’ is a something one has to code. So better think about a stream of bytes you need to code.

Second
On very short information, Huffman coding can result in bigger results then the information was. Why: The dictionary takes some space, and if you try to code 8 bytes the chance is very high that the dictionary needs more bytes then the original information.

Third
Make a file with nodepad containing one character. Check the size in explorer. Depending on your hardisk format it shows in windows explorer 1KB in properties "Space" 1 Byte" in "Space on volume" 4096 Bytes.

Conclusion
Check carefully the information.

And by the way Solution 1 is the theoretically correct explanation.

And congratulation that you try to figure it out by yourself and not "just" using a library!

Regards.


It sounds like you are writing the string "001111011001" to the file as a string.

A string is a sequence of characters. Depending on how you are encoding the characters, each character will take 8 or 16 bits in the file. If you convert the single character "2" into the sequence of 3 characters "111" you''ve just trippled the size of your file.

Instead of a sequence of characters, what you are supposed to be creating is a sequence of bits.

Not sure how you do it in Java, but in C you could do it like this:

char *encoded_string = "0011111000110000111100001110001100";

int len = strlen(encoded_string);

char byte_value = 0;
for (int ix = 0; ix < len; ix += 8) {
  byte_value = byte_value << 1;
  if (encoded_string[ix] == '1')
     byte_value = byte_value |= 1;

  if ((ix % 8) == 7) {
     fprintf(outputfile, "%c", byte_value);
     byte_value = 0;
  }
}

int remainder = ix % 8;
if (remainder) {
  byte_value = byte_value << remainder;
  fprintf(outputfile, "%c", byte_value);
}



There may be a Java library function that does that conversion for you, but I gave it to you in this format so you could see what actually needs to happen and you could understand the difference.

This code example assumes you want to encode with the most significant bit first. It gets around byte ordering issues by outputing a byte at a time.

The code pads out the end of the file with zero bits -- that may be an issue for your huffman encoding scheme. It probably ought to append an end of file code before padding out with zero bits.


这篇关于关于霍夫曼算法的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆