Java的哈夫曼树 [英] Huffman Tree in Java

查看:101
本文介绍了Java的哈夫曼树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的哈夫曼树代码有问题。在main方法中,我输入了一个符号字符串,我还输入了一个包含符号频率的整数数组。它应该打印出每个Symbol及其Huffman代码,但我认为它错了...

I have a problem with my Huffman tree code. In the main method I input a String of Symbols and I also input an Integer array containing the frequency of the Symbols. It should print out each Symbol and its Huffman code, but I think its wrong...

这是代码:

 package huffman;

import java.util.*;

abstract class HuffmanTree implements Comparable<HuffmanTree> {
    public final int frequency; // the frequency of this tree
    public HuffmanTree(int freq) { frequency = freq; }

    // compares on the frequency
    public int compareTo(HuffmanTree tree) {
        return frequency - tree.frequency;
    }
}

class HuffmanLeaf extends HuffmanTree {
    public final char value; // the character this leaf represents

    public HuffmanLeaf(int freq, char val) {
        super(freq);
        value = val;
    }
}

class HuffmanNode extends HuffmanTree {
    public final HuffmanTree left, right; // subtrees

    public HuffmanNode(HuffmanTree l, HuffmanTree r) {
        super(l.frequency + r.frequency);
        left = l;
        right = r;
    }
}

public class Huffman {
    // input is an array of frequencies, indexed by character code
    public static HuffmanTree buildTree(int[] charFreqs, char[] test2) {
        PriorityQueue<HuffmanTree> trees = new PriorityQueue<HuffmanTree>();
        // initially, we have a forest of leaves
        // one for each non-empty character
        for (int i = 0; i < charFreqs.length; i++)
            if (charFreqs[i] > 0)
                trees.offer(new HuffmanLeaf(charFreqs[i], test2[i]));

        assert trees.size() > 0;
        // loop until there is only one tree left
        while (trees.size() > 1) {
            // two trees with least frequency
            HuffmanTree a = trees.poll();
            HuffmanTree b = trees.poll();

            // put into new node and re-insert into queue
            trees.offer(new HuffmanNode(a, b));
        }
        return trees.poll();
    }

    public static void printCodes(HuffmanTree tree, StringBuffer prefix) {
        assert tree != null;
        if (tree instanceof HuffmanLeaf) {
            HuffmanLeaf leaf = (HuffmanLeaf)tree;

            // print out character, frequency, and code for this leaf (which is just the prefix)
            System.out.println(leaf.value + "\t" + leaf.frequency + "\t" + prefix);

        } else if (tree instanceof HuffmanNode) {
            HuffmanNode node = (HuffmanNode)tree;

            // traverse left
            prefix.append('0');
            printCodes(node.left, prefix);
            prefix.deleteCharAt(prefix.length()-1);

            // traverse right
            prefix.append('1');
            printCodes(node.right, prefix);
            prefix.deleteCharAt(prefix.length()-1);
        }
    }

    public static void main(String[] args) {
        //Symbols:
        String str = "12345678"; 
        char[] test2 = str.toCharArray();
        //Frequency (of the symbols above):
        int[] charFreqs = {36,18,12,9,7,6,5,4};


        // build tree
        HuffmanTree tree = buildTree(charFreqs,test2);

        // print out results
        System.out.println("SYMBOL\tFREQ\tHUFFMAN CODE");
        printCodes(tree, new StringBuffer());
    }
}

我得到的输出是:

SYMBOL  FREQ    HUFFMAN CODE
1           36          0
3           12          100
6           6           1010
5           7           1011
2           18          110
4           9           1110
8           4           11110
7           5           11111

这很奇怪,例如符号7应该是:11110和符号8应该是:11111

Thats weird, for example Symbol 7 should be: 11110 and Symbol 8 should be: 11111

你能帮帮我吗?

推荐答案

位模式的分配与代码的最优性无关。你的任务将工作得很好。没有什么奇怪的。你本可以对2:110,3:100或4:1110,5:1011表示担忧,但这些也很好。

The assignment of the bit patterns does not matter to the optimality of the code. The assignment you have will work just fine. There is nothing weird about it. You could have also expressed concern over 2:110, 3:100, or 4:1110, 5:1011, but those are fine too.

强加的唯一理由代码上的顺序是减少将代码从压缩器传送到解压缩器所需的位数。您可以发送每个符号的代码长度,而不是发送代码,只要代码的构造在两侧都是相同的长度。

The only reason to impose an order on the codes is to reduce the number of bits needed to convey the code from the compressor to the decompressor. Instead of sending the codes, you can send the code lengths for each symbol, so long as the code is constructed identically on both sides from just the lengths.

在这种情况下,方法通常是将数字顺序的代码分配给符号的排序列表。然后你确实会让符号7的代码值低于符号8,如果这是它们的分配顺序。

In that case, the approach is usually to assign the code in numerical order to a sorted list of symbols. Then you would indeed have the symbol 7 with a lower code "value" than the symbol 8, if that's the order in which they are assigned.

对于你的例子,这样的话规范代码是:

For your example, such a canonical code would be:

1: 1 - 0
2: 3 - 100
3: 3 - 101
4: 4 - 1100
5: 4 - 1101
6: 4 - 1110
7: 5 - 11110
8: 5 - 11111

您只需获取长度并在相同长度内对符号进行排序。然后分配以0开头并递增的代码,在你加长长度时将位添加到末尾。

You simply takes the lengths and within the same length, sort the symbols. Then assign codes starting with 0 and incrementing, adding bits to the end as you step up lengths.

注意这是一个不寻常的例子,其中符号顺序也是频率顺序。通常情况并非如此。

Note that this is an unusual example, where the symbol order is also the frequency order. Normally that's not the case.

这篇关于Java的哈夫曼树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆