如何使用scikit learning计算多类案例的精度,召回率,准确性和f1-得分? [英] How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn?

查看:105
本文介绍了如何使用scikit learning计算多类案例的精度,召回率,准确性和f1-得分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理情绪分析问题,数据看起来像这样:

I'm working in a sentiment analysis problem the data looks like this:

label instances
    5    1190
    4     838
    3     239
    1     204
    2     127

所以我的数据不平衡,因为1190 instances5标记.对于分类,我使用scikit的 SVC .问题是我不知道如何以正确的方式平衡我的数据,以便准确地计算多类案例的精度,查全率,准确性和f1得分.因此,我尝试了以下方法:

So my data is unbalanced since 1190 instances are labeled with 5. For the classification Im using scikit's SVC. The problem is I do not know how to balance my data in the right way in order to compute accurately the precision, recall, accuracy and f1-score for the multiclass case. So I tried the following approaches:

第一:

    wclf = SVC(kernel='linear', C= 1, class_weight={1: 10})
    wclf.fit(X, y)
    weighted_prediction = wclf.predict(X_test)

print 'Accuracy:', accuracy_score(y_test, weighted_prediction)
print 'F1 score:', f1_score(y_test, weighted_prediction,average='weighted')
print 'Recall:', recall_score(y_test, weighted_prediction,
                              average='weighted')
print 'Precision:', precision_score(y_test, weighted_prediction,
                                    average='weighted')
print '\n clasification report:\n', classification_report(y_test, weighted_prediction)
print '\n confussion matrix:\n',confusion_matrix(y_test, weighted_prediction)

第二:

auto_wclf = SVC(kernel='linear', C= 1, class_weight='auto')
auto_wclf.fit(X, y)
auto_weighted_prediction = auto_wclf.predict(X_test)

print 'Accuracy:', accuracy_score(y_test, auto_weighted_prediction)

print 'F1 score:', f1_score(y_test, auto_weighted_prediction,
                            average='weighted')

print 'Recall:', recall_score(y_test, auto_weighted_prediction,
                              average='weighted')

print 'Precision:', precision_score(y_test, auto_weighted_prediction,
                                    average='weighted')

print '\n clasification report:\n', classification_report(y_test,auto_weighted_prediction)

print '\n confussion matrix:\n',confusion_matrix(y_test, auto_weighted_prediction)

第三:

clf = SVC(kernel='linear', C= 1)
clf.fit(X, y)
prediction = clf.predict(X_test)


from sklearn.metrics import precision_score, \
    recall_score, confusion_matrix, classification_report, \
    accuracy_score, f1_score

print 'Accuracy:', accuracy_score(y_test, prediction)
print 'F1 score:', f1_score(y_test, prediction)
print 'Recall:', recall_score(y_test, prediction)
print 'Precision:', precision_score(y_test, prediction)
print '\n clasification report:\n', classification_report(y_test,prediction)
print '\n confussion matrix:\n',confusion_matrix(y_test, prediction)


F1 score:/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:676: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
  sample_weight=sample_weight)
/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1172: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
  sample_weight=sample_weight)
/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1082: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
  sample_weight=sample_weight)
 0.930416613529

但是,我收到这样的警告:

However, Im getting warnings like this:

/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1172:
DeprecationWarning: The default `weighted` averaging is deprecated,
and from version 0.18, use of precision, recall or F-score with 
multiclass or multilabel data or pos_label=None will result in an 
exception. Please set an explicit value for `average`, one of (None, 
'micro', 'macro', 'weighted', 'samples'). In cross validation use, for 
instance, scoring="f1_weighted" instead of scoring="f1"

如何正确处理我的不平衡数据,以便以正确的方式计算分类器的指标?

How can I deal correctly with my unbalanced data in order to compute in the right way classifier's metrics?

推荐答案

我认为对于将哪些权重用于什么有很多困惑.我不确定我是否确切地知道让您感到困扰,所以我将涉及不同的话题,请忍受;).

I think there is a lot of confusion about which weights are used for what. I am not sure I know precisely what bothers you so I am going to cover different topics, bear with me ;).

class_weight参数中的权重用于训练分类器. 它们不会用于计算您正在使用的任何指标:使用不同的班级权重,数字会有所不同,仅仅是因为分类器不同而已.

The weights from the class_weight parameter are used to train the classifier. They are not used in the calculation of any of the metrics you are using: with different class weights, the numbers will be different simply because the classifier is different.

基本上,在每个scikit-learn分类器中,类权重都用于告诉您的模型,类的重要性.这意味着在训练过程中,分类器将付出更多的努力来对权重较高的类进行正确分类.
他们如何做到这一点是特定于算法的.如果您想了解有关SVC如何在SVC中工作的详细信息,而您觉得该文档没有意义,请随时提及.

Basically in every scikit-learn classifier, the class weights are used to tell your model how important a class is. That means that during the training, the classifier will make extra efforts to classify properly the classes with high weights.
How they do that is algorithm-specific. If you want details about how it works for SVC and the doc does not make sense to you, feel free to mention it.

有了分类器后,您想知道它的效果如何. 在这里,您可以使用提到的指标:accuracyrecall_scoref1_score ...

Once you have a classifier, you want to know how well it is performing. Here you can use the metrics you mentioned: accuracy, recall_score, f1_score...

通常,当班级分配不平衡时,准确性被认为是较差的选择,因为它会给仅预测最频繁班级的模型提供高分.

Usually when the class distribution is unbalanced, accuracy is considered a poor choice as it gives high scores to models which just predict the most frequent class.

我不会详细说明所有这些指标,但是请注意,除了accuracy之外,它们自然应用于类级别:如您在此分类报告的print中所见,它们是为每个类定义的.他们依赖诸如true positivesfalse negative之类的概念,这些概念要求定义哪个类是阳性.

I will not detail all these metrics but note that, with the exception of accuracy, they are naturally applied at the class level: as you can see in this print of a classification report they are defined for each class. They rely on concepts such as true positives or false negative that require defining which class is the positive one.

             precision    recall  f1-score   support

          0       0.65      1.00      0.79        17
          1       0.57      0.75      0.65        16
          2       0.33      0.06      0.10        17
avg / total       0.52      0.60      0.51        50

警告

F1 score:/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:676: DeprecationWarning: The 
default `weighted` averaging is deprecated, and from version 0.18, 
use of precision, recall or F-score with multiclass or multilabel data  
or pos_label=None will result in an exception. Please set an explicit 
value for `average`, one of (None, 'micro', 'macro', 'weighted', 
'samples'). In cross validation use, for instance, 
scoring="f1_weighted" instead of scoring="f1".

之所以收到此警告,是因为您使用的是f1分数,召回率和精确度,而未定义应如何计算它们! 问题可以改写:从上面的分类报告中,您如何输出f1分数的一个全局数字? 您可以:

You get this warning because you are using the f1-score, recall and precision without defining how they should be computed! The question could be rephrased: from the above classification report, how do you output one global number for the f1-score? You could:

  1. 取每个类的f1-分数的平均值:这是上面的avg / total结果.这也称为平均.
  2. 使用真阳性/假阴性等的全局计数等来计算f1-分数(您将每个类别的真阳性/假阴性的总数相加).又名 micro 平均.
  3. 计算f1分数的加权平均值.在scikit-learn中使用'weighted'将通过类的支持权衡f1分数:类具有的元素越多,则该类的f1分数在计算中就越重要.
  1. Take the average of the f1-score for each class: that's the avg / total result above. It's also called macro averaging.
  2. Compute the f1-score using the global count of true positives / false negatives, etc. (you sum the number of true positives / false negatives for each class). Aka micro averaging.
  3. Compute a weighted average of the f1-score. Using 'weighted' in scikit-learn will weigh the f1-score by the support of the class: the more elements a class has, the more important the f1-score for this class in the computation.

这是scikit-learn中的3个选项,警告提示您必须选择一个.因此,您必须为score方法指定一个average参数.

These are 3 of the options in scikit-learn, the warning is there to say you have to pick one. So you have to specify an average argument for the score method.

选择哪种方法取决于衡量分类器性能的方式:例如,宏平均不会考虑类的不平衡,并且类1的f1得分与f1一样重要第5类的分数.如果您使用加权平均,那么您将在第5类中获得更多的重要性.

Which one you choose is up to how you want to measure the performance of the classifier: for instance macro-averaging does not take class imbalance into account and the f1-score of class 1 will be just as important as the f1-score of class 5. If you use weighted averaging however you'll get more importance for the class 5.

这些度量标准中的整个参数规范目前在scikit-learn中尚不十分清楚,根据文档,它在0.18版中会变得更好.他们正在删除一些不太明显的标准行为,并发出警告,以便开发人员注意到它.

The whole argument specification in these metrics is not super-clear in scikit-learn right now, it will get better in version 0.18 according to the docs. They are removing some non-obvious standard behavior and they are issuing warnings so that developers notice it.

最后我想提一下(如果您知道它,可以略过它)是,分数只有在根据分类器从未见过的数据进行计算时才有意义. 这一点非常重要,因为您获得的用于拟合分类器的数据得分都是完全不相关的.

Last thing I want to mention (feel free to skip it if you're aware of it) is that scores are only meaningful if they are computed on data that the classifier has never seen. This is extremely important as any score you get on data that was used in fitting the classifier is completely irrelevant.

这是使用StratifiedShuffleSplit进行此操作的一种方法,它可以随机分配数据(经过改组后),以保留标签的分布.

Here's a way to do it using StratifiedShuffleSplit, which gives you a random splits of your data (after shuffling) that preserve the label distribution.

from sklearn.datasets import make_classification
from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix

# We use a utility to generate artificial classification data.
X, y = make_classification(n_samples=100, n_informative=10, n_classes=3)
sss = StratifiedShuffleSplit(y, n_iter=1, test_size=0.5, random_state=0)
for train_idx, test_idx in sss:
    X_train, X_test, y_train, y_test = X[train_idx], X[test_idx], y[train_idx], y[test_idx]
    svc.fit(X_train, y_train)
    y_pred = svc.predict(X_test)
    print(f1_score(y_test, y_pred, average="macro"))
    print(precision_score(y_test, y_pred, average="macro"))
    print(recall_score(y_test, y_pred, average="macro"))    

希望这会有所帮助.

这篇关于如何使用scikit learning计算多类案例的精度,召回率,准确性和f1-得分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆