用Java进行Huffman编码期间无法压缩文件 [英] Unable to compress file during Huffman Encoding in Java

查看:113
本文介绍了用Java进行Huffman编码期间无法压缩文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用优先级队列在Java中实现了霍夫曼编码算法,其中我从根到叶遍历树,并根据符号在输入中出现的次数获得编码示例为#= 000011。一切都很好,树被很好地构建,编码正如预期的那样:但是我得到的输出文件的大小比原始文件大。我目前正在附加 0和遍历树的左节点和右节点时,字符串为 1。我最终可能会为每个字符使用所有8位,这对压缩没有帮助。我猜想这些位需要一些转换为字符值。这样这些字符使用的位数少于8,因此我得到了原始文件的压缩版本。您能否让我知道如何通过操纵字符并减少Java中的位来实现压缩?谢谢

I have implemented the Huffman Encoding Algorithm in Java using Priority Queues where I traverse the Tree from Root to Leaf and get encoding example as #=000011 based on the number of times the symbol appears in the input. Everything is fine, the tree is being built fine, encoding is just as expected: But the output file I am getting is bigger size than the original file. I am currently appending '0' & '1' to a String on traversing left node and right node of the tree. Probably what I end up with uses all 8 bits for each characters and it does not help in compression. I am guessing there is some conversion of these bits into character values which is required. So that these characters use fewer bits than 8 and hence I get a compressed version of the original file. Could you please let me know how to achieve a compression by manipulating characters and reducing bits in Java? Thanks

推荐答案

您可能正在使用StringBuilder并附加 0或 1,或者只是 + 运算符将 0或 1连接到字符串的末尾。或者您正在使用某种 OutputStream 并对其进行写入。

You're probably using a StringBuilder and appending "0" or "1", or simply the + operator to concatenate "0" or "1" to the end of your string. Or you're using some sort of OutputStream and writing to it.

您要做的是编写实际的位。我建议写之前先写一个完整的字节。字节看起来像这样:

What you want to do is to write the actual bits. I'd suggest making a whole byte first before writing. A byte looks like this:

0x05

将表示二进制字符串 0000 0011

您可以使通过创建 byte 类型,添加并移动它们,可以实现以下目的:

You can make these by making a byte type, adding and shifting:

public void writeToFile(String binaryString, OutputStream os){
    int pos = 0;
    while(pos < binaryString.length()){
        byte nextByte = 0x00;
        for(int i=0;i<8 && pos+i < binaryString.length(); i++){
            nextByte << 1;
            nextByte += binaryString.charAt(pos+i)=='0'?0x0:0x1;
        }
        os.write(nextByte);
        pos+=8;
    }
}

当然,在a处写入一个字节效率不高时间,最重要的是,OutputStream接口仅接受字节数组( byte [] )。因此,最好将字节存储在数组中(或者甚至更容易地将 List )存储在数组中,然后再将它们写入更大的块中。

Of course, it's inefficient to write one byte at a time, and on top of that the OutputStream interface only accepts byte arrays (byte[]). So you'd be better off storing the bytes in an array (or even easier, a List), then writing them at bigger chunks.

如果不允许使用字节写入(为什么不这样?ObjectOutputStream支持写入字节数组!),则可以使用Base64对二进制字符串进行编码。但是请记住,Base64使您的数据使用量增加了33%。

If you are not allowed to use byte writes (why the heck not? ObjectOutputStream supports writing byte arrays!), then you can use Base64 to encode your binary string. But remember that Base64 inflates your data usage by 33%.

将字节数组转换为base64的一种简单方法是使用现有的编码器。添加以下导入后:

An easy way to convert a byte array to base64 is by using an existing encoder. After adding the following import:

import sun.misc.BASE64Encoder;

您可以实例化编码器并将字节数组转换为字符串:

You can instantiate the encoder and turn your byte array into a string:

byte[] bytes = getBytesFromHuffmanEncoding();
BASE64Encoder encoder = new BASE64Encoder();
String encodedString = encoder.encode(bytes);

这篇关于用Java进行Huffman编码期间无法压缩文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆