在霍夫曼压缩/解压缩中处理最后一个字节 [英] Handling last byte in huffman compression/decompression

查看:134
本文介绍了在霍夫曼压缩/解压缩中处理最后一个字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个程序,该程序根据在文本输入文件中读取的ASCII字符频率生成霍夫曼树。霍夫曼码存储在由256个元素组成的字符串数组中;如果未读取字符,则为空字符串。然后,该程序还对输出文件进行编码和压缩,然后能够将压缩后的文件作为输入文件进行解压缩和解码。

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also then encodes and compresses an output file and then is able to take the compressed file as an input file and does decompression and decoding.

总而言之,我的程序需要输入文件将对输出文件进行压缩和编码,关闭输出文件,然后将编码作为输入文件打开,然后获取一个新的输出文件,该文件应具有与原始文本输入文件相同的解码消息。

In summary, my program takes a input file compresses and encodes an output file, closes the output file and opens the encoding as an input file, and takes a new output file that is supposed to have a decoded message identical to the original text input file.

该程序的当前问题:解码压缩文件时,我得到一个额外的字符,或者该字符不在解码的原始输入文件中。这是由于我所知道的垃圾位。通过研究,我发现一种解决方案可能是使用psuedo-EOF字符在读取垃圾位之前停止解码,但是我不确定如何在当前处理编码和解码的函数中实现此功能,因此对所有指导和帮助表示赞赏。

My current problem with this program: When decoding the compressed file I get an extra character or so that is not in the original input file decoded. This is due to the trash bits from what I know. With research I found one solution may be to use a psuedo-EOF character to stop decoding before the trash bits are read but I am not sure how to implement this in my current functions that handle encoding and decoding so all guidance and help is much appreciated.

我的最终目标是能够使用此程序来完全解码编码的文件,而无需将垃圾位发送到输出文件。

My end goal is to be able to use this program to also completely decode the encoded file without the trash bits sent to output file.

下面我有两个函数,encodedOutput和decodeOutput,用于处理压缩和解压缩。

Below I have two functions, encodedOutput and decodeOutput that handle the compression and decompression.

(对于encodeOutput函数,fileName是输入文件参数,fileName2是输出文件参数)

(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)

(对于encodeOutput函数,fileName2是输入文件参数,fileName 3是输出文件参数)

(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)

code [256]是这两个函数的参数,并保存原始输入文件中读取的每个唯一字符的霍夫曼代码,例如,字符 H在输入文件中被读取时,可能在将代码[72]传递给函数时在代码数组中存储了代码 111。

code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.

freq [256]保存每个ASCII字符的读取频率,如果它不在原始输入文件中,则保持0。

freq[256] holds the frequency of each ascii character read or holds 0 if it is not in original input file.

void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
    ifstream ifile; //to read file
    ifile.open(fileName, ios::binary);
    if (!ifile)//to check if file is open or not
    {
        die("Can't read again"); // function that exits program if can't open
    }
    ofstream ofile;
    ofile.open(fileName2, ios::binary);
    if (!ofile) {
        die("Can't open encoding output file");
    }
    int read; 
    read = ifile.get(); //read one char from file and store it in int
    char buffer = 0, bit_count = 0;
    while (read != -1) {//run this loop until reached to end of file(-1)
        for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
            buffer <<= 1;
            buffer |= code[read][b] != '0';
            bit_count++;
            if (bit_count == 8) {
                ofile << buffer;
                buffer = 0;
                bit_count = 0;
            }
        }
        read = ifile.get();
    }

    if (bit_count != 0)
        ofile << char(buffer << (8 - bit_count));

    ifile.close();
    ofile.close();
}

void decodeOutput(const string & fileName2, const string & fileName3, string code[256], const unsigned long long freq[256]) {
    ifstream ifile;
    ifile.open(fileName2, ios::binary);
    if (!ifile)
    {
        die("Can't read again");
    }
    ofstream ofile;
    ofile.open(fileName3, ios::binary);
    if (!ofile) {
        die("Can't open encoding output file");
    }
    priority_queue < node > q;
    for (unsigned i = 0; i < 256; i++) {
        if (freq[i] == 0) {
            code[i] = "";
        }
    }

    for (unsigned i = 0; i < 256; i++)
        if (freq[i])
            q.push(node(unsigned(i), freq[i]));

    if (q.size() < 1) {
        die("no data");
    }

    while (q.size() > 1) {
        node *child0 = new node(q.top());
        q.pop();
        node *child1 = new node(q.top());
        q.pop();
        q.push(node(child0, child1));
    } // created the tree
    string answer = "";
    const node * temp = &q.top(); // root 
    for (int c; (c = ifile.get()) != EOF;) {
        for (unsigned p = 8; p--;) { //reading 8 bits at a time 
            if ((c >> p & 1) == '0') { // if bit is a 0
                temp = temp->child0; // go left
            }
            else { // if bit is a 1
                temp = temp->child1; // go right
            }
            if (temp->child0 == NULL && temp->child1 == NULL) // leaf node
            {
                answer += temp->value;
                temp = &q.top();
            }
        }
    }
  ofile << ans;
}


推荐答案

将其更改为 freq [257] code [257] ,并设置 freq [256] 到一个。您的EOF是符号256,它将在流中最后出现一次。在编码结束时,发送符号256。解码时收到符号256时,请停止。

Change it to freq[257] and code[257], and set freq[256] to one. Your EOF is symbol 256, and it will appear once in the stream, at the end. At the end of your encoding, send symbol 256. When you receive symbol 256 while decoding, stop.

这篇关于在霍夫曼压缩/解压缩中处理最后一个字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆