如何在多标签分类中计算F1测度? [英] How to Calculate F1 measure in multi-label classification?
问题描述
我正在研究句子类别检测问题.每个句子可以属于多个类别,例如:
I am working on sentence category detection Problem. Where each sentence can belong to multiple categories for Example:
"It has great sushi and even better service."
True Label: [[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1.]]
Pred Label: [[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1.]]
Correct Prediction!
Output: ['FOOD#QUALITY' 'SERVICE#GENERAL']
我实现了一个可以预测多个类别的分类器.我总共有587个句子,属于多个类别.我用两种方法计算了准确性得分:
I have implemented a classifier that can predict multiple categories. I have total 587 sentences that belongs to multiple categories. I have calculated the accuracy scores in two ways:
示例的所有标签是否都已预测?
If all labels of an example predicted or not?
代码:
print "<------------ZERO one ERROR------------>"
print "Total Examples:",(truePred+falsePred) ,"True Pred:",truePred, "False Pred:", falsePred, "Accuracy:", truePred/(truePred+falsePred)
输出:
<------------ZERO one ERROR------------>
Total Examples: 587 True Pred: 353 False Pred: 234 Accuracy: 0.60136286201
所有示例正确预测了多少个标签?
How many labels are correctly predicted for all examples?
代码:
print "\n<------------Correct and inccorrect predictions------------>"
print "Total Labels:",len(total[0]),"Predicted Labels:", corrPred, "Accuracy:", corrPred/len(total[0])
输出:
<------------Correct and inccorrect predictions------------>
Total Labels: 743 Predicted Labels: 522 Accuracy: 0.702557200538
问题: 这些都是通过将预测分数与地面真实标签进行比较而计算出的准确性分数.但是我想计算F1分数(使用微平均),精度和召回率.我有基本事实标签,我需要将自己的预测与这些基本事实标签相匹配.但是,我不知道如何解决此类多标签分类问题. 我可以在python中使用scikit-learn或任何其他库吗?
Problem: These are all the accuracy scores calculated by comparing predicted scores with ground truth labels. But i want to calculate F1 score (using micro averaging), precision and recall as well. I have ground truth labels and i need to match my predictions with those ground truth labels. But, i don't know how do i tackle such type of multi-label classification problem. Can i use scikit-learn or any other libraries in python?
推荐答案
我制作了预测标签predictedlabel
的矩阵,并且我已经具有正确的类别来比较y_test
中的结果.因此,我尝试了以下代码:
I made matrix of predicted labels predictedlabel
and i already had correct categories to compare my results in y_test
. So, i tried the following code:
from sklearn.metrics import classification_report
from sklearn.metrics import f1_score
from sklearn.metrics import roc_auc_score
print "Classification report: \n", (classification_report(y_test, predictedlabel))
print "F1 micro averaging:",(f1_score(y_test, predictedlabel, average='micro'))
print "ROC: ",(roc_auc_score(y_test, predictedlabel))
我得到了以下结果:
precision recall f1-score support
0 0.74 0.93 0.82 57
1 0.00 0.00 0.00 3
2 0.57 0.38 0.46 21
3 0.75 0.75 0.75 12
4 0.44 0.68 0.54 22
5 0.81 0.93 0.87 226
6 0.57 0.54 0.55 48
7 0.71 0.38 0.50 13
8 0.70 0.72 0.71 142
9 0.33 0.33 0.33 33
10 0.42 0.52 0.47 21
11 0.80 0.91 0.85 145
av/total 0.71 0.78 0.74 743
F1 micro averaging: 0.746153846154
ROC: 0.77407943841
所以,我以这种方式计算结果!
So, i am calculating my results in this way!
这篇关于如何在多标签分类中计算F1测度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!