斯坦福大学分类器交叉验证的平均或汇总指标 [英] Stanford classifier cross validation averaged or aggregate metrics

查看:192
本文介绍了斯坦福大学分类器交叉验证的平均或汇总指标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通过斯坦福分类器,可以通过在属性文件,例如用于十折交叉验证的文件:

crossValidationFolds=10
printCrossValidationDecisions=true
shuffleTrainingData=true
shuffleSeed=1

运行此命令将逐次输出各种指标,例如精度,召回率,准确度/微观平均F1和宏观平均F1.

作为输出的一部分,是否可以选择获得全部10个准确度/微观平均F1或全部10个宏观平均F1的平均分数或以其他方式汇总?

在Weka中,默认情况下,十倍交叉验证后的输出包括所有倍数的平均指标.斯坦福分类器中是否也提供这样的选项?像在Weka中一样,拥有最终的精度,召回率或F1分数并针对它优化参数非常有用,我想使用Stanford Classifier做到这一点.怎么样?

解决方案

当我以10折运行时,我看到的是输出.当我运行此命令时:

java -cp "*" edu.stanford.nlp.classify.ColumnDataClassifier -prop examples/cheese2007.prop -crossValidationFolds 10

我在输出中看到了这一点(在###折叠9之后)

[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - 181 examples in test set
[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - Cls 2: TP=109 FN=6 FP=7 TN=59; Acc 0.928 P 0.940 R 0.948 F1 0.944
[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - Cls 1: TP=59 FN=7 FP=6 TN=109; Acc 0.928 P 0.908 R 0.894 F1 0.901
[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - Accuracy/micro-averaged F1: 0.92818
[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - Macro-averaged F1: 0.92224 
[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - Average accuracy/micro-averaged F1: 0.93429
[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - Average macro-averaged F1: 0.92247

With Stanford Classifier it is possible to use cross validation by setting the options in the properties file, such as this for 10-fold cross validation:

crossValidationFolds=10
printCrossValidationDecisions=true
shuffleTrainingData=true
shuffleSeed=1

Running this will output, per fold, the various metrics, such as precision, recall, Accuracy/micro-averaged F1 and Macro-averaged F1.

Is there an option to get an averaged or otherwise aggregated score of all 10 Accuracy/micro-averaged F1 or all 10 Macro-averaged F1 as part of the output?

In Weka, by default the output after 10-fold cross validation includes averaged metrics over all folds. Is such an option also available in Stanford Classifier? Having a final precision, recall or F1 score available and optimizing the parameters against it like in Weka is very useful, and I would like to do this with Stanford Classifier. How?

解决方案

When I run with 10 folds, I am seeing that output. When I run this command:

java -cp "*" edu.stanford.nlp.classify.ColumnDataClassifier -prop examples/cheese2007.prop -crossValidationFolds 10

I see this in the output (after ### Fold 9)

[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - 181 examples in test set
[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - Cls 2: TP=109 FN=6 FP=7 TN=59; Acc 0.928 P 0.940 R 0.948 F1 0.944
[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - Cls 1: TP=59 FN=7 FP=6 TN=109; Acc 0.928 P 0.908 R 0.894 F1 0.901
[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - Accuracy/micro-averaged F1: 0.92818
[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - Macro-averaged F1: 0.92224 
[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - Average accuracy/micro-averaged F1: 0.93429
[main] INFO edu.stanford.nlp.classify.ColumnDataClassifier - Average macro-averaged F1: 0.92247

这篇关于斯坦福大学分类器交叉验证的平均或汇总指标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆