霍夫曼树:遍历 [英] Huffman Tree: Traversing

查看:72
本文介绍了霍夫曼树:遍历的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不确定我将如何攻击我的霍夫曼树的遍历.这棵树是正确的,我只是很难弄清楚如何以一种好的方式遍历它.由于某种原因,我的遍历方法没有给出任何结果...

I'm not sure how I'm going to attack the traversing of my Huffman Tree. The tree is correct, I just have a hard time figuring out how to traverse it in a good way. For some reason, my traversing method gives no result...

更新:清理代码,使其更加面向对象

UPDATE: Cleaned up the code, made it more Object Oriented

节点类:

public class Node
{
    public int frekvens; //Frequency
    public char tegn; //Symbol
    public Node venstre; //Left child
    public Node høyre; //Right child
    public string s; //result string
    public string resultat;
    public Node (char c) // Node constructor containing symbol.
    {
        frekvens = 1;
        tegn = c;
    }

    public Node (int f, Node venstre, Node høyre) // Node Constructor containing frequency and children
        {
            frekvens = f;
            this.venstre = venstre;
            this.høyre = høyre;
        }

    public Node (Node node) // Node constructor containing a node
        {
            frekvens = node.frekvens;
            tegn = node.tegn;
            this.venstre = venstre;
            this.høyre = høyre;
        }

    public void ØkMed1() // Inkrement frequency by one
    {
        frekvens = frekvens + 1;
    }


    public char getVenstreTegn ()
    {
        return venstre.tegn;
    }

    public char getHøyreTegn ()
    {
        return venstre.tegn;
    }

    public int getVenstreFrekvens ()
    {
        return venstre.frekvens;
    }

    public int getHøyreFrekvens ()
    {
        return høyre.frekvens;
    }

    public int getFrekvens()
    {
        return frekvens;
    }


    public bool ErTegn(char c)
    {
        if ( c == tegn)
        {
            return false;
        }
        else
        {
            return true;
        }
    }

    //Pretty sure this does not work as intended
    public string traverser (Node n) //Traverse the tree
    {
        if (n.tegn != '\0') //If the node containes a symbol --> a leaf
        {
            resultat += s;  
        }
        else
        {
            if (n.getVenstreTegn() == '\0') //If left child does not have a symbol
            {
                s += "0";
                traverser(n.venstre);
            }
            if (n.getHøyreTegn() == '\0') //If right child does not have a symbol
            {
                s += "1";
                traverser(n.høyre);
            }
        }
        return resultat;
    }
    public string Resultat() //Used priviously to check if i got the correct huffman tree
    {
        string resultat;
        resultat = "Tegn: " + Convert.ToString(tegn) +"  frekvens: " + Convert.ToString(frekvens) + "\n";
        return resultat;
    }
}

Huffman_Tree 类:

Huffman_Tree Class:

public class Huffman_Tre
{
    string treString;
    List<Node> noder = new List<Node>();
    public Node rot;
    public void bygg (string input)
    {
        bool funnet; //Found
        char karakter; //character

        for (int i = 0; i < input.Length;i++) //Loops through string and sets character
            //with coresponding freqeuncy in the node list
        {   
            karakter = input[i];
            funnet = false; //default
            for (int j = 0; j< noder.Count; j++)
            {
                if (noder[j].ErTegn(karakter) == false) //if the character already exists
                {
                    noder[j].ØkMed1(); //inkrement frequency by one
                    funnet = true; 
                    break;
                }
            }
            if (!funnet) //if the character does not exist 
            {
                noder.Add(new Node(karakter)); //add the character to list
            }
        }
        //Sorting node list acending by frequency
        var sortertListe = noder.OrderBy(c => c.frekvens).ToList();

        noder = sortertListe; 

        do
        {
            noder.Add(new Node((noder[0].frekvens + noder[1].frekvens), noder[0],noder[1]));

            //Remove the leaf nodes
            noder.RemoveAt(0);
            noder.RemoveAt(0); 
        } while(noder.Count >= 2);

    }

    public Node getRot()
    {
        return rot;
    }

    public string visTre()
    {

        foreach (Node node in noder)
        {
            treString += node.Resultat();
        }
        return treString;
    }
    public bool erNull()
    {
        if (noder[0].tegn == '\0')
        {
            return true;
        }
        else
            return false;
    }
}

主程序:

private void btnKomprimer_Click(object sender, System.Windows.RoutedEventArgs e)
    {
        string input; //The string input I want to compress
        input = txtInput.Text; //initialize input to text input
        input = input.ToLower(); 
        txtOutput.Text = "";

        Huffman_Tre tre = new Huffman_Tre();

        tre.bygg(input);

        Node rot = new Node(tre.getRot());

        txtOutput.Text += rot.traverser(rot);
    }
}

推荐答案

由于我还有一点时间,我在玩 C# 6.0 的同时制定了一个哈夫曼树的例子.它没有优化(甚至到目前为止还没有!),但作为一个例子,它运行良好.它将帮助您了解您的挑战"可能出现在哪里.由于我的英语比我的斯堪的纳维亚知识好得多,所以我使用了英文命名,希望你不要介意.

As I had a little bit of time left, I worked out an example of a Huffman tree, while playing with C# 6.0. It's not optimized (not even by far!), but it works fine as an example. And it will help you to look where your 'challenge' may arise. As my English is far better than my Scandinavian knowledge, I used English naming, I hope you don't mind.

首先,让我们从保持频率的类开始.

First, let's start with the class that keeps the frequencies.

public sealed class HuffmanFrequencyTable
{       
    #region Properties
    /// <summary>
    /// Holds the characters and their corresponding frequencies
    /// </summary>
    public Dictionary<char, int> FrequencyTable  { get; set; } = new Dictionary<char, int>();

    #endregion

    #region Methods

    /// <summary>
    /// Clears the internal frequency table
    /// </summary>
    public void Clear()
    {
        FrequencyTable?.Clear();            
    }

    /// <summary>
    /// Accepts and parses a new line (string) which is then 
    /// merged with the existing dictionary or frequency table
    /// </summary>
    /// <param name="line">The line to parse</param>
    public void Accept(string line)
    {
        if (!string.IsNullOrEmpty(line))
        {
            line.GroupBy(ch => ch).
                 ToDictionary(g => g.Key, g => g.Count()).
                 ToList().
                 ForEach(x => FrequencyTable[x.Key] = x.Value);
        }
    }

    /// <summary>
    /// Performs a dump of the frequency table, ordering all characters, lowest frequency first.
    /// </summary>
    /// <returns>The frequency table in the format 'character [frequency]'</returns>
    public override string ToString()
    {            
        return FrequencyTable?.PrintFrequencies();
    }
    #endregion
}

请注意,ToString() 方法使用能够转储"所用词典内容的扩展方法.扩展位于名为 Helpers 的静态类中,如下所示:

Please note that the ToString() method uses an extension method that is able to 'dump' the contents of the Dictionary used. The extensions is located in a static class called Helpers and looks like this:

/// <summary>
/// Extension method that helps to write the contents of a generic Dictionary to a string, ordered by it's values and 
/// printing the key and it's value between brackets.
/// </summary>
/// <typeparam name="TKey">Generic key</typeparam>
/// <typeparam name="TValue">Generic value type</typeparam>
/// <param name="dictionary">The dictionary</param>
/// <exception cref="ArgumentNullException">Throws an argument null exception if the provided dictionary is null</exception>
/// <returns></returns>
public static string PrintFrequencies<TKey, TValue>(this IDictionary<TKey, TValue> dictionary)
{
    if (dictionary == null)
        throw new ArgumentNullException("dictionary");

    var items = from kvp in dictionary
                orderby kvp.Value
                select kvp.Key + " [" + kvp.Value + "]";

    return string.Join(Environment.NewLine, items);
}

现在,有了这个频率表,我们可以开始研究如何构建节点.Huffman 使用二叉树,因此最好生成一个具有左右子节点的 Node 类.我也冒昧地在这里执行遍历算法.该类构建如下:

Now, with this FrequencyTable in place, we can start looking on how to build up the Nodes. Huffman works with a binary tree, so it's best to generate a Node class having a left and right child node. I also took the liberty to perform the traversal algorithm here as well. This class is built up as following:

public sealed class HuffmanNode
{
    #region Properties

    /// <summary>
    /// Holds the left node, if applicable, otherwise null
    /// </summary>
    public HuffmanNode Left { get; set; } = null;

    /// <summary>
    /// Holds the right node, if applicable, otherwise null
    /// </summary>
    public HuffmanNode Right { get; set; } = null;

    /// <summary>
    /// Holds the Character (or null) for this particular node
    /// </summary>
    public char? Character { get; set; } = null;

    /// <summary>
    /// Holds the frequency for this particular node, defaulted to 0
    /// </summary>
    public int Frequency { get; set; } = default(int);

    #endregion

    #region Methods

    /// <summary>
    /// Traverses all nodes recursively returning the binary 
    /// path for the corresponding character that has been found.
    /// </summary>
    /// <param name="character">The character to find</param>
    /// <param name="data">The datapath (containing '1's and '0's)</param>
    /// <returns>The complete binary path for a character within a node</returns>
    public List<bool> Traverse(char? character, List<bool> data)
    {
        //Check the leafs for existing characters
        if (null == Left && null == Right)
        {
            //We're at an endpoint of our 'tree', so return it's data or nothing when the symbol
            //characters do not match
            return (bool)character?.Equals(Character) ? data : null;
        }
        else
        {
            List<bool> left = null;
            List<bool> right = null;

            //TODO: If possible refactor with proper C# 6.0 features
            if (null != Left)
            {
                List<bool> leftPath = new List<bool>(data);                    
                leftPath.Add(false); //Add a '0'
                left = Left.Traverse(character, leftPath); //Recursive traversal for child nodes within this left node.
            }

            if (null != Right)
            {
                List<bool> rightPath = new List<bool>(data);                    
                rightPath.Add(true); //Add a '1'
                right = Right.Traverse(character, rightPath); //Recursive traversal for childnodes within this right node
            }

            return (null != left) ? left : right;
        }
    }        

    #endregion
}

我在 HuffmanTree 类中使用 Node 类.从逻辑上讲,树是由节点构建的.对应的HuffmanTree是这样写的:

I use the Node class within the HuffmanTree class. As, logically, a tree is built up from nodes. The corresponding HuffmanTree is written this way:

public sealed class HuffmanTree
{
    #region Fields
    /// <summary>
    /// Field for keeping the Huffman nodes in. Internally used.
    /// </summary>
    private List<HuffmanNode> nodes = new List<HuffmanNode>();        
    #endregion

    #region Properties

    /// <summary>
    /// Holds the Huffman tree
    /// </summary>
    public HuffmanNode Root   {  get; set; } = null;

    /// <summary>
    /// Holds the frequency table for all parsed characters
    /// </summary>
    public HuffmanFrequencyTable Frequencies { get; private set; } = new HuffmanFrequencyTable()

    /// <summary>
    /// Holds the amount of bits after encoding the tree.
    /// Primary usable for decoding.
    /// </summary>
    public int BitCountForTree  { get;  private set;  } = default(int);

    #endregion

    #region Methods

    /// <summary>
    /// Builds the Huffman tree
    /// </summary>
    /// <param name="source">The source to build the Hufftree from</param>
    /// <exception cref="ArgumentNullException">Thrown when source is null or empty</exception>
    public void BuildTree(string source)
    {
        nodes.Clear(); //As we build a new tree, first make sure it's clean :)

        if (string.IsNullOrEmpty(source))
            throw new ArgumentNullException("source");
        else
        {
            Frequencies.Accept(source);

            foreach (KeyValuePair<char, int> symbol in Frequencies.FrequencyTable)
            {
                nodes.Add(new HuffmanNode() { Character = symbol.Key, Frequency = symbol.Value });
            }

            while (nodes.Count > 1)
            {
                List<HuffmanNode> orderedNodes = nodes.OrderBy(node => node.Frequency).ToList();

                if (orderedNodes.Count >= 2)
                {
                    List<HuffmanNode> takenNodes = orderedNodes.Take(2).ToList();

                    HuffmanNode parent = new HuffmanNode()
                    {
                        Character = null,
                        Frequency = takenNodes[0].Frequency + takenNodes[1].Frequency,
                        Left = takenNodes[0],
                        Right = takenNodes[1]
                    };

                    //Remove the childnodes from the original node list and add the new parent node
                    nodes.Remove(takenNodes[0]);
                    nodes.Remove(takenNodes[1]);
                    nodes.Add(parent);
                }
            }

            Root = nodes.FirstOrDefault();
        }
    }

    /// <summary>
    /// Encodes a given string to the corresponding huffman encoding path
    /// </summary>
    /// <param name="source">The source to encode</param>
    /// <returns>The binary huffman representation of the source</returns>
    public BitArray Encode(string source)
    {
        if (!string.IsNullOrEmpty(source))
        {
            List<bool> encodedSource = new List<bool>();
            //Traverse the tree for each character in the passed source (string) and add the binary path to the encoded source
            encodedSource.AddRange(source.SelectMany(character =>
                                        Root.Traverse(character, new List<bool>())
                                    ).ToList()
                                  );
            //For decoding, we might need the amount of bits to skip trailing bits.
            BitCountForTree = encodedSource.Count;
            return new BitArray(encodedSource.ToArray());
        }
        else return null;
    }

    /// <summary>
    /// Decodes a given binary path to represent it's string value
    /// </summary>
    /// <param name="bits">BitArray for traversing the tree</param>
    /// <returns></returns>
    public string Decode(BitArray bits)
    {
        HuffmanNode current = Root;
        string decodedString = string.Empty;

        foreach (bool bit in bits)
        {
            //Find the correct current node depending on the bit set or not set.
            current = (bit ? current.Right ?? current : current.Left ?? current);               

            if (current.IsLeaf())
            {
                decodedString += current.Character;
                current = Root;
            }
        }

        return decodedString;
    }

    #endregion
}

这段代码的有趣之处在于,我决定使用 BitArrays 来保存树构建时的二进制路径.此处的 public BitArray Encode(string source) 方法包含一个肮脏的黑客.我跟踪用于编码的总位数并将其存储在 BitCountForTree 属性中.执行解码时,我将使用此属性删除可能出现的任何尾随位.有一种更好的方法来执行此操作,但我会保留开放让您了解.

What is interesting in this code, is that I decided to use BitArrays that will hold the binary paths for the tree when it's build up. The public BitArray Encode(string source) method here contains a dirty hack. I keep track of the total amount of bits used for encoding and store this within the BitCountForTree property. When performing a decode, I'll use this property to remove any trailing bits that may arise. There is a way nicer way to perform this, but I'll leave that open for you to find out.

此外,该类使用为 HuffmanNode 编写的扩展方法.不过很简单:

Also, this class makes use of an extension method written for the HuffmanNode. It's a simple one though:

  /// <summary>
  /// Determines whether a given Huffman node is a leaf or not.
  /// A node is considered to be a leaf when it has no childnodes
  /// </summary>
  /// <param name="node">A huffman node</param>
  /// <returns>True if no children are left, false otherwise</returns>
  public static bool IsLeaf(this HuffmanNode node)
  {
      return (null == node.Left && null == node.Right);
  }

这种扩展方法可以方便地确定给定节点是否实际上是叶节点.叶节点是没有子节点的节点,因此是二叉树的末端(或者更好的是该树的一个分支).

This extension method is convenient to determine whether or not a given node is actually a leafnode. A leaf is a node which has no childnodes left and thus the end of a binary tree (or better a branch of that tree).

现在是有趣的部分,我如何让事情在这里工作.我已经构建了一个具有 3 个文本框的 Windows 窗体应用程序.一个用于实际输入,一个用于二进制(编码)输出,最后一个用于显示压缩结果.我还放置了两个简单的按钮,一个执行霍夫曼编码,一个执行霍夫曼解码.

Now the interesting part, how do I make things work here. I have build a Windows Forms application having 3 textboxes. One for the actual input, one for the binary (encoded) output and the last for showing the compressed result. I also placed two simple buttons, one to perform the Huffman encoding and one for the Huffman decoding.

Huffman 编码方法如下(只是在encode 按钮的eventhandler 中):

The Huffman encoding method is written as following (just in the eventhandler of the encode button):

 string input = tbInput.Text;
 Tree.BuildTree(input); //Build the huffman tree

 BitArray encoded = Tree.Encode(input); //Encode the tree

 //First show the generated binary output
 tbBinaryOutput.Text = string.Join(string.Empty, encoded.Cast<bool>().Select(bit => bit ? "1" : "0"));

  //Next, convert the binary output to the new characterized output string.       
  byte[] bytes = new byte[(encoded.Length / 8) + 1];
  encoded.CopyTo(bytes, 0);

  tbOutput.Text = Encoding.Default.GetString(bytes); //Write the compressed output to the textbox.

请注意,编码后的二进制字符串没有任何尾随位.我将把它留给 C# 的编码机制.这样做的缺点是我必须在解码时跟踪它.

Note that the encoded binary string does not have any trailing bits. I'll leave that up to the Encoding mechanisms of C#. The downside of this, is that I have to keep track of it when decoding.

现在解码也不太难了.虽然在这个例子中,我使用了上面放置的编码代码生成的压缩输出.另外,我假设 Huffman 树(以及它的频率表!!!)已经构建好了.通常,频率表存储在压缩文件中,以便可以重建.

The decoding is not too hard now as well. Although, for this example, I am making use of the compressed output generated by the encoding code placed above. Also, I am assuming that the Huffman tree (and it's frequency table!!!) are already built. Normally, the frequency table is stored within the compressed file, so that it can be rebuild.

 //First convert the compressed output to a bit array again again and skip trailing bits.            
 bool[] boolAr = new BitArray(Encoding.Default.GetBytes(tbOutput.Text)).Cast<bool>().Take(Tree.BitCountForTree).ToArray();
 BitArray encoded = new BitArray( boolAr );

 string decoded = Tree.Decode(encoded);
 MessageBox.Show(decoded, "Decoded result: ", MessageBoxButtons.OK, MessageBoxIcon.Information);

请注意我创建的脏hack,因为Encoding.Default.GetBytes(tbOutput.Text) 肯定会生成一个字节数组,它可能包含不需要解码的尾随位.因此,基于重建树,我只取了我实际需要的位数.

Please pay attention to the dirty hack I created, as the Encoding.Default.GetBytes(tbOutput.Text) surely generates a byte array, it may contain trailing bits which need not to be decoded. Hence that I only take the amount of bits that I will actually need, based upon the rebuild tree.

所以在运行时,我的示例提供了以下输出,当使用世界知名句子"The quick brown fox jumps over the lazy program"时:

So when running, my example provides the following output, when using the 'world renown sentence' "The quick brown fox jumps over the lazy programmer":

按下Huff 编码"按钮后:

After pressing the "Huff encode" button:

按下Huff decode"按钮后:

And after pressing the "Huff decode" button:

现在这段代码真的可以使用一些优化了,因为您可能会考虑使用数组而不是字典.还有更多,但这取决于您的考虑.

Now this code can really use some optimizations, as you might consider using Arrays instead of Dictionaries. There are more, but it's up to you for consideration.

这篇关于霍夫曼树:遍历的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆