为什么Mallet文本分类为所有测试文件输出相同的值1.0? [英] Why Mallet text classification output the same value 1.0 for all test files?

查看:96
本文介绍了为什么Mallet文本分类为所有测试文件输出相同的值1.0?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习Mallet文本分类命令行.估计不同类别的输出值都是相同的1.0.我不知道我在哪里不正确.你能帮忙吗?

I am learning Mallet text classification command lines. The output values for estimating differrent classes are all the same 1.0. I do not know where I am incorrect. Can you help?

小号版本:E:\ Mallet \ mallet-2.0.8RC3

mallet version: E:\Mallet\mallet-2.0.8RC3

//there is a txt file about cat breed (catmaterial.txt) in cat dir.
//command 1
C:\Users\toshiba>mallet import-dir --input E:\Mallet\testmaterial\cat --output E
:\Mallet\testmaterial\cat.mallet --remove-stopwords

//command 1 output
Labels =
   E:\Mallet\testmaterial\cat

//command 2, save classifier as catClass.classifier
C:\Users\toshiba>mallet train-classifier --input E:\Mallet\testmaterial\cat.mall
et --trainer NaiveBayes --output-classifier E:\Mallet\testmaterial\catClass.clas
sifier

//command 2 output
Training portion = 1.0
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.0

-------------------- Trial 0  --------------------

Trial 0 Training NaiveBayesTrainer with 1 instances
Trial 0 Training NaiveBayesTrainer finished
No examples with predicted label !
No examples with true label !
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer training data accuracy = 1.0
Trial 0 Trainer NaiveBayesTrainer Test Data Confusion Matrix
No examples with predicted label !
Trial 0 Trainer NaiveBayesTrainer test data precision() = 1.0
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data recall() = 1.0
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data F1() = 1.0
Trial 0 Trainer NaiveBayesTrainer test data accuracy = NaN

NaiveBayesTrainer
Summary. train accuracy mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test accuracy mean = NaN stddev = NaN stderr = NaN
Summary. test precision() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test recall() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test f1() mean = 1.0 stddev = 0.0 stderr = 0.0

//command 3, estimate classes of the three files about cat, deer and dog. The cat file is the same as the one for cat.mallet
C:\Users\toshiba>mallet classify-dir --input E:\Mallet\testmaterial\test_cat_dir
 --output - --classifier E:\Mallet\testmaterial\catClass.classifier


//command 3 output
file:/E:/Mallet/testmaterial/test_cat_dir/catmaterial.txt               1.0
file:/E:/Mallet/testmaterial/test_cat_dir/deertext.txt          1.0
file:/E:/Mallet/testmaterial/test_cat_dir/dogmaterial.txt               1.0

// why the three classes are all 1.0 ?

C:\Users\toshiba>

可以帮忙吗? 谢谢.

++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

更新:

感谢您的回答,但对于所有文件仍输出1.0.

Thank you for answer, but still output 1.0 for all files.

我的想法是,我将一些狗文件放在狗目录中,并将这些狗文件视为实例,经过训练的模型,然后在test_dir中测试一些文件以查看结果.

My idea was that I put some dog files in dog dir and treated these dog files as instances, trained model, then tested some files in test_dir to see the result.

我根据对您建议的理解进行了尝试,但仍输出相同的1.0.

I tried according to my understanding of your suggestion but still output all same 1.0.

您会在下面的命令行帮助我吗?

Will you help me with my commandlines below?

在E:\ Mallet \ train_dir \ dog中,有4个dog txt文件(dog 2.txt,dog4.txt,dog5.txt,dogmaterial.txt).

In E:\Mallet\train_dir\dog, there are 4 dog txt files(dog 2.txt, dog4.txt,dog5.txt, dogmaterial.txt).

在E:\ Mallet \ test_dir中,有9个txt文件(cat2.txt,catmaterial.txt,deermaterial.txt,dog3.txt,dog6.txt,dog 2.txt,dog4.txt,dog5.txt, dogmaterial.txt).

In E:\Mallet\test_dir, there are 9 txt files (cat2.txt, catmaterial.txt, deermaterial.txt, dog3.txt, dog6.txt, dog 2.txt, dog4.txt, dog5.txt, dogmaterial.txt).

C:\Users\toshiba>mallet import-dir --input E:\Mallet\train_dir\dog --output E:\M
allet\classifier_dir\3animal.mallet --remove-stopwords
Labels =
   E:\Mallet\train_dir\dog


C:\Users\toshiba>mallet train-classifier --input E:\Mallet\classifier_dir\3anima
l.mallet --trainer NaiveBayes --output-classifier E:\Mallet\classifier_dir\3anim
alClass.classifier
Training portion = 1.0
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.0                          
-------------------- Trial 0  --------------------

Trial 0 Training NaiveBayesTrainer with 4 instances
Trial 0 Training NaiveBayesTrainer finished
No examples with predicted label !
No examples with true label !
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer training data accuracy = 1.0
Trial 0 Trainer NaiveBayesTrainer Test Data Confusion Matrix
No examples with predicted label !
Trial 0 Trainer NaiveBayesTrainer test data precision() = 1.0
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data recall() = 1.0
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data F1() = 1.0
Trial 0 Trainer NaiveBayesTrainer test data accuracy = NaN

NaiveBayesTrainer
Summary. train accuracy mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test accuracy mean = NaN stddev = NaN stderr = NaN
Summary. test precision() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test recall() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test f1() mean = 1.0 stddev = 0.0 stderr = 0.0


C:\Users\toshiba>mallet classify-dir --input E:\Mallet\test_dir --output - --cla
ssifier E:\Mallet\classifier_dir\3animalClass.classifier

file:/E:/Mallet/test_dir/cat2.txt               1.0
file:/E:/Mallet/test_dir/catmaterial.txt                1.0
file:/E:/Mallet/test_dir/deertext.txt           1.0
file:/E:/Mallet/test_dir/dog%202.txt            1.0
file:/E:/Mallet/test_dir/dog3.txt               1.0
file:/E:/Mallet/test_dir/dog4.txt               1.0
file:/E:/Mallet/test_dir/dog5.txt               1.0
file:/E:/Mallet/test_dir/dog6.txt               1.0
file:/E:/Mallet/test_dir/dogmaterial.txt                1.0
C:\Users\toshiba>


谢谢.


Thank you.

推荐答案

有两个输入选项. input-dir将目录视为类,并将每个目录中的每个文件视为输入实例. input-file逐行读取输入文件,并将行中的各个字段视为标签和实例数据.您正在使用目录中的文件"输入类型,因此要创建一个具有一个类和一个实例的分类器.我猜你想要文件中的行类型.

There are two input options. input-dir treats directories as classes and each file in each directory as an input instance. input-file reads the input file line by line and treats various fields within the line as label and instance data. You are using the files-in-directories input type, so you are creating a classifier with one class and one instance. I'm guessing you want the lines-in-file type.

这篇关于为什么Mallet文本分类为所有测试文件输出相同的值1.0?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆