霍夫曼解码压缩文件 [英] Huffman Decoding Compressed File

查看:91
本文介绍了霍夫曼解码压缩文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个程序,该程序根据在文本输入文件中读取的ASCII字符频率生成霍夫曼树.霍夫曼码存储在由256个元素组成的字符串数组中;如果未读取字符,则为空字符串.该程序还对输出文件进行编码和压缩.

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also encodes and compresses an output file.

我现在正试图解压缩和解码当前的输出文件,该文件将作为输入文件打开,而新的输出文件将使解码后的消息与原始文本输入文件相同.

I am now trying to decompress and decode my current output file which is opened as an input file and a new output file is to have the decoded message identical to the original text input file.

我对我这部分作业的思考过程是从我制作的编码功能开始向后工作,一次读取8位,然后以某种方式通过更新变量(字符串n)来解码消息,该变量最初是一个空字符串,通过递归霍夫曼树,直到获得代码以输出到输出文件.

My thought process for this part of my assignment is to work backwards from the encoding function I have made and read 8 bits at a time and somehow decode the message by updating a variable (string n) which is an empty string at first, through recursion of the Huffman tree until I get a code to output to output file.

我目前已经启动了该函数,但是遇到了困扰,我正在寻找一些指导以编写当前的解码输出函数.感谢所有帮助.
我完成的encodingOutput函数和encodeOutput函数如下:

I have currently started the function but I am stuck and I am looking for some guidance in writing my current decodeOutput function. All help is appreciated.
My completed encodedOutput function and decodeOutput function is down below:

(对于encodeOutput函数,fileName是输入文件参数,fileName2是输出文件参数)

(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)

(对于encodeOutput函数,fileName2是输入文件参数,fileName 3是输出文件参数)

(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)

code [256]是这两个函数的参数,并保存原始输入文件中读取的每个唯一字符的霍夫曼代码,例如,在输入文件中读取的字符"H"可能具有以下代码:在将代码[72]传递给函数时,存储在代码[72]的代码数组中的"111".

code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.

void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
    ifstream ifile;//to read file
    ifile.open(fileName, ios::binary);
    if (!ifile) //to check if file is open or not
    {
        die("Can't read again");
    }
    ofstream ofile;
    ofile.open(fileName2, ios::binary);
    if (!ofile) {
        die("Can't open encoding output file");
    }
    int read;
    read = ifile.get();//read one char from file and store it in int
    char buffer = 0, bit_count = 0;
    while (read != -1) {
        for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
            buffer <<= 1;
            buffer |= code[read][b] != '0'; 
            bit_count++;
            if (bit_count == 8) {
                ofile << buffer;
                buffer = 0;
                bit_count = 0;
            }
        }
        read = ifile.get();
    }

    if (bit_count != 0)
        ofile << (buffer << (8 - bit_count));

    ifile.close();
    ofile.close();
}

//Work in progress
void decodeOutput(const string & fileName2, const string & fileName3, string code[256]) {
    ifstream ifile;
    ifile.open(fileName2, ios::binary);
    if (!ifile)
    {
        die("Can't read again");
    }
    ofstream ofile;
    ofile.open(fileName3, ios::binary);
    if (!ofile) {
        die("Can't open encoding output file");
    }
    string n = ""; 
    for (int c; (c = ifile.get()) != EOF;) {
        for (unsigned p = 8; p--;) {
            if ((c >> p & 1) == '0') { // if bit is a 0

            }
            else if ((c >> p & 1) == '1') { // if bit is a 1

            }
            else { // Output string n (decoded character) to output file
              ofile << n;
            }
        }
    }
}

推荐答案

如果您拥有用于构造密码本的原始霍夫曼树,则解码会更容易.但是,假设您只有密码本(即 string code [256] ),而没有原始的霍夫曼树.您可以执行以下操作:

The decoding would be easier if you had the original Hoffman tree used to construct the codebook. But suppose you only have the codebook (i.e., the string code[256]) but not the original Hoffman tree. What you can do is the following:

  • 将代码本分为不同长度的代码字组.假设该码本由n个不同长度的码字组成:L 0 <L 1 <...<L n-1 .
  • 从输入文件中读取(但不消耗)k位,其中k从L 0 增大到L n-1 ,直到找到对于某些i,输入k位和长度为k = L i 的码字.
  • 输出与匹配代码字相对应的8位字符,并消耗输入文件中的k位.
  • 重复直到输入文件中的所有位都用完为止.
  • Partition the codebook into groups of codewords with different lengths. Say the codebook consists of codewords with n different lengths: L0 < L1 < ... < Ln-1 .
  • Read (but do not consume yet) k bits from input file, with k increasing From L0 up to Ln-1, until you find a match between the input k bits and a codeword of length k = Li for some i.
  • Output the 8-bit character corresponding to the matching codeword, and consume the k bits from input file.
  • Repeat until all bits from input file are consumed.

如果代码簿的构造正确,并且您总是以递增的长度查找代码字,则永远不会找到找不到匹配的代码字的输入位序列.

If the codebook were constructed correctly, and you always look up the codewords in increasing length, you should never find a sequence of input bits which you cannot find a matching codeword.

有效地,就霍夫曼树等价而言,每次将k个输入位与一组长度为k的代码字进行比较时,您都在检查树级别k的叶子是否包含输入匹配代码字.每次将k增加到下一个较长的代码字组时,您就沿着树走到更高的级别(例如,级别0是根).

Effectively, in terms of the Hoffman tree equivalence, every time you compare k input bits with a group of codewords of length k, you are checking whether a leaf at tree level-k contains an input-matching codeword; every time you increase k to the next longer group of codewords, you are walking down the tree to a higher level (say level-0 is the root).

这篇关于霍夫曼解码压缩文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆