霍夫曼树编码 [英] Huffman Tree Encoding
问题描述
我之前问过的霍夫曼树有另一个问题!代码如下:
My Huffman tree which I had asked about earlier has another problem! Here is the code:
package huffman;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.PriorityQueue;
import java.util.Scanner;
public class Huffman {
public ArrayList<Frequency> fileReader(String file)
{
ArrayList<Frequency> al = new ArrayList<Frequency>();
Scanner s;
try {
s = new Scanner(new FileReader(file)).useDelimiter("");
while (s.hasNext())
{
boolean found = false;
int i = 0;
String temp = s.next();
while(!found)
{
if(al.size() == i && !found)
{
found = true;
al.add(new Frequency(temp, 1));
}
else if(temp.equals(al.get(i).getString()))
{
int tempNum = al.get(i).getFreq() + 1;
al.get(i).setFreq(tempNum);
found = true;
}
i++;
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return al;
}
public Frequency buildTree(ArrayList<Frequency> al)
{
Frequency r = al.get(1);
PriorityQueue<Frequency> pq = new PriorityQueue<Frequency>();
for(int i = 0; i < al.size(); i++)
{
pq.add(al.get(i));
}
/*while(pq.size() > 0)
{
System.out.println(pq.remove().getString());
}*/
for(int i = 0; i < al.size() - 1; i++)
{
Frequency p = pq.remove();
Frequency q = pq.remove();
int temp = p.getFreq() + q.getFreq();
r = new Frequency(null, temp);
r.left = p;
r.right = q;
pq.add(r); // put in the correct place in the priority queue
}
pq.remove(); // leave the priority queue empty
return(r); // this is the root of the tree built
}
public void inOrder(Frequency top)
{
if(top == null)
{
return;
}
else
{
inOrder(top.left);
System.out.print(top.getString() +", ");
inOrder(top.right);
return;
}
}
public void printFreq(ArrayList<Frequency> al)
{
for(int i = 0; i < al.size(); i++)
{
System.out.println(al.get(i).getString() + "; " + al.get(i).getFreq());
}
}
}
现在需要做的是我需要创建一个方法来搜索树以找到特定字符的二进制代码(011001 等).做这个的最好方式是什么?我想也许我会在树中进行正常的搜索,就好像它是一棵 AVL 树,如果它较大则向右,如果较小则向左.
What needs to be done now is I need to create a method that will search through the tree to find the binary code (011001 etc) to the specific character. What is the best way to do this? I thought maybe I would do a normal search through the tree as if it were an AVL tree going to the right if its bigger or left if it's smaller.
但是因为节点不使用 ints doubles 等,而只使用包含字符作为字符串或 null 的对象来表示它不是叶子而是根.另一种选择是按顺序运行以找到我正在寻找的叶子,但同时我将如何确定我是向右走了多少次还是离开了这么多次来获得角色.
But because the nodes don't use ints doubles etc. but only using objects that contain characters as strings or null to signify its not a leaf but only a root. The other option would be to do an in-order run through to find the leaf that I'm looking for but at the same time how would I determine if I went right so many times or left so many times to get the character.
package huffman;
public class Frequency implements Comparable {
private String s;
private int n;
public Frequency left;
public Frequency right;
Frequency(String s, int n)
{
this.s = s;
this.n = n;
}
public String getString()
{
return s;
}
public int getFreq()
{
return n;
}
public void setFreq(int n)
{
this.n = n;
}
@Override
public int compareTo(Object arg0) {
Frequency other = (Frequency)arg0;
return n < other.n ? -1 : (n == other.n ? 0 : 1);
}
}
我想要做的是找到二进制代码来实际到达每个字符.因此,如果我尝试对 aabbbcccc
进行编码,我将如何创建一个字符串,其中包含向左为 0 而向右为 1 的二进制代码.
What I'm trying to do is find the binary code to actually get to each character. So if I were trying to encode aabbbcccc
how would I create a string holding the binary code for a going left is 0 and going right is 1.
让我感到困惑的是,你无法确定任何东西在哪里,因为树显然是不平衡的,并且无法确定角色是在你所在位置的右边还是左边.因此,您必须搜索整个树,但如果您找到的节点不是您要查找的节点,则必须回溯到另一个根以到达其他叶子.
What has me confused is because you can't determine where anything is because the tree is obviously unbalanced and there is no determining if a character is right or left of where you are. So you have to search through the whole tree but if you get to a node that isn't what you are looking for, you have backtrack to another root to get to the other leaves.
推荐答案
记住,如果你有 1001,你永远不会有 10010 或 10011.所以你的基本方法看起来像这样(伪代码):
Remember, if you have 1001, you will never have a 10010 or 10011. So your basic method looks like this (in pseudocode):
if(input == thisNode.key) return thisNode.value
if(input.endsWith(1)) return search(thisNode.left)
else return search(thisNode.right)
我没有阅读你的程序来弄清楚如何集成它,但简而言之,这是霍夫曼编码的一个关键元素
I didn't read your program to figure out how to integrate it, but that's a key element of huffman encoding in a nutshell
尝试这样的事情 - 您正在尝试查找令牌.所以如果你想找到10010"的字符串,你会做 search(root,"10010")
Try something like this - you're trying to find token. So if you wanted to find the String for "10010", you'd do search(root,"10010")
String search(Frequency top, String token) {
return search(top,token,0);
}
// depending on your tree, you may have to switch top.left and top.right
String search(Frequency top, String token, int depth) {
if(token.length() == depth) return "NOT FOUND";
if(token.length() == depth - 1) return top.getString();
if(token.charAt(depth) == '0') return search(top.left,token,depth+1);
else return search(top.right,token,depth+1);
}
这篇关于霍夫曼树编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!