Weka-分类器为任何输入返回相同的分布 [英] Weka - Classifier returns the same distribution for any input
问题描述
我正在尝试构建一个朴素的贝叶斯分类器来对两个类之间的文本进行分类.一切都可以在GUI资源管理器中很好地工作,但是当我尝试在代码中重新创建它时,无论我尝试分类什么输入,我都会得到相同的输出.
I'm trying to build a naive bayes classifier for classifying text between two classes. Everything works great in the GUI explorer, but when I try to recreate it in code, I get the same output no matter what input I try to classify.
在代码内,我得到与GUI中相同的评估指标(准确性为81%),但是每当我尝试创建一个新实例并对它进行分类时,无论我输入了什么,我都会得到两个类的相同分布使用.
Within the code, I get the same evaluation metrics I get within the GUI (81% accuracy), but whenever I try to create a new instance and classify that, I get the same distributions for both classes no matter what input I use.
下面是我的代码-在scala中,但非常简单:
Below is my code - its in scala, but is pretty straightforward:
//Building the classifier:
val instances = new Instances(new DataSource("/my/dataset.arff").getDataSet)
instances.setClassIndex(3)
val filter = new StringToWordVector
filter.setAttributeIndicesArray( (0 to 2).toArray )
val classifier = new FilteredClassifier
classifier.setFilter(new StringToWordVector(1000000))
classifier.setClassifier(new NaiveBayesMultinomial)
classifier.buildClassifier(trainingSet)
//Evaluation (this prints about 80% accuracy)
val eval = new Evaluation(trainingSet)
eval.evaluateModel(classifier, trainingSet)
println(eval.toSummaryString)
//Attempting to use the classifier:
val atts = new util.ArrayList[Attribute]
atts.add(new Attribute("sentence", true))
atts.add(new Attribute("parts_of_speech", true))
atts.add(new Attribute("dependency_graph", true))
atts.add(new Attribute("the_shizzle_clazz", SentenceType.values().map(_.name()).toSeq.asJava ))
val unlabeledInstances = new Instances("unlabeled", atts, 1)
unlabeledInstances.setClassIndex( 3 )
val instance = new DenseInstance(4)
unlabeledInstances.add(instance)
instance.setDataset(unlabeledInstances)
instance.setValue(0, parsed.sentence)
instance.setValue(1, parsed.posTagsStr)
instance.setValue(2, parsed.depsGraphStr)
val distrib = classifier.distributionForInstance(unlabeledInstance.firstInstance())
distrib.foreach(println)
无论我提供什么输入,distrib的输出始终为:
No matter what input I give, the output of distrib is always:
0.44556173367704455
0.5544382663229555
有什么主意我做错了吗?非常感谢您的帮助.
Any ideas what I'm doing wrong? Would greatly appreciate any help.
推荐答案
魔术线似乎是:
instance.setClassMissing()
添加后使其生效. :)
Adding that made it work. :)
这篇关于Weka-分类器为任何输入返回相同的分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!