霍夫曼编码完成后如何在Java中写入文件 [英] How to write to a file in Java after Huffman Coding is done

查看:58
本文介绍了霍夫曼编码完成后如何在Java中写入文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经实现了霍夫曼编码的类。该类将解析输入文件并从中构建一个霍夫曼树,并创建一个映射,该映射将出现在文件中的每个不同字符作为键,并将该字符的霍夫曼代码作为其值。

I have implemented a class for Huffman coding. The class will parse an input file and build a huffman tree from it and creates a map which has each of the distinct characters appeared in the file as the key and the huffman code of the character as its value.

例如,让字符串 aravind_is_a_good_boy成为文件中的唯一行。当您构建霍夫曼树并为每个字符生成霍夫曼代码时,我们可以看到,对于字符 a,霍夫曼代码为 101,对于字符 r,霍夫曼代码为 0101,依此类推

For example, let the string "aravind_is_a_good_boy" be the only line in the file. When you build the huffman tree and generate the huffman code for each character, we can see that, for the character 'a', the huffman code is '101' and for the character 'r', the huffman code is '0101' etc.

我的意图是压缩文件。因此,我无法将通过用霍夫曼代码替换每个字符而创建的字符串直接写到文件中。由于每个字符都将至少替换为3个字符(每个 1和 0仍会作为字符而不是位写入文件中)。所以我想我会将其作为字节写入文件,因为您无法将位写入文件。但是随后, a和 r都被写为 5到文件中。

My intention is to compress the file. So I cannot write a string, which is created by replacing each character, by its huffman code, directly to the file. Since, each character would be replaced by at least 3 characters (Each '1' and '0' would still be written into the file as a character, not bits). So I thought I would write it to a file as a bytes, since there is no way you can write bits to a file. But then, 'a' and 'r' are both written as '5' into the file. This would cause problem when trying to decompress the file.

这是我将一系列位转换为字节的方式:

This is how I am converting a series of bits to bytes:

public byte[] compressString(String s, CharCodeHashMap map) {
        String byteString = "";
        byte[] byteArr = new byte[s.length()];
        int size = 0;
        for (int i = 0; i < s.length(); i++) {
            byteString += addPaddingZeros(map.getCompressedChar(s.charAt(i)));
            byteArr[size++] = new BigInteger(byteString, 2).toByteArray()[0];
            byteString = "";
        }

        return byteArr;
    }

我尝试为每个哈希码添加前缀 1,以解决此问题。但是,当您构建霍夫曼树并读取文件时,某些字符将超过8位。然后,问题是 new BigInteger(byteString,2).toByteArray()在数组中将有1个以上的元素。(例如,如果'v'具有哈希码'11010001'和 new BigInteger(byteString,2).toByteArray()返回元素数组[0,-47]。)

I tried prefixing '1' to each of the hashcodes, to fix the problem. But then, when you build a huffman tree, reading a file, some characters would have more than 8 bits. Then, the problem is new BigInteger(byteString, 2).toByteArray() would have more than 1 element in the array.(For eg, if 'v' has the hashcode '11010001' and new BigInteger(byteString, 2).toByteArray() returns an array of elements [0, -47].)

有人可以建议我一种写文件的方法,这样可以压缩文件,同时也可以解决这些问题。

Can someone please suggest me a way to write to a file such that, the file would be compressed and at the same time, these problems are also taken care.

推荐答案

是的,您可以将位写入文件。实际上,您总是在向文件写入位。唯一的事情是您一次要写入八个位。

Yes, you can write bits to a file. In fact you are always writing bits to a file. The only thing is that you are writing eight bits at a time.

您需要的是一个位缓冲区,即一个32位无符号变量,您可以在其中累加位。还有另一个整数,该整数跟踪位缓冲区中的多少位。使用左移和或(或加号)运算符将更多位放入位缓冲区,并使用和和右移运算符将其删除。只要位缓冲区中有八个或更多位,就将这八个位作为字节写入文件。最后,将剩余的位(如果有的话)作为最后一个字节写入文件。

What you need is a bit buffer, say a 32-bit unsigned variable, into which you accumulate bits. Have another integer that tracks how many bits are in the bit buffer. Use the shift left and or (or plus) operators to put more bits in the bit buffer, and the and and shift right operators to remove them. Whenever you have eight or more bits in the bit buffer, you write those eight bits to the file as a byte. At the end, write the remaining bits (if any) to the file as the last byte.

因此,将值中的位添加到缓冲区中:

So, to add the bits bits in value to the buffer:

bitBuffer |= value << bitCount;
bitcount += bits;

写入和删除可用字节:

while (bitCount >= 8) {
    writeByte(bitBuffer & 0xff);
    bitBuffer >>>= 8;
    bitCount -= 8;
}

您需要确保在解码时不要误认为填充符最后一个字节中的位作为另一个代码。您可以在消息之前发送消息中的实际位数(或最后一个字节中的位数),也可以在字母表中添加符号以获取流自身的霍夫曼码,并且

You need to make sure that when decoding, you don't mistake the filler bits in the last byte as another code. You can either send the actual number of bits in the message preceding the message (or the number of bits in the last byte), or you can add a symbol to your alphabet for end-of-stream that gets its own Huffman code, and end the message with that.

您遇到的另一个问题是,您还需要在编码符号之前将霍夫曼代码本身传送给解码器,以使解码器知道如何解码。查找规范霍夫曼代码以了解如何有效地处理该问题。

The other problem you have is that you will also need to transmit the Huffman code itself to the decoder before the coded symbols in order for the decoder to know how to decode. Look up "canonical Huffman codes" for how to approach that efficiently.

这篇关于霍夫曼编码完成后如何在Java中写入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆