如何从Python的混淆矩阵中获取精度,召回率和f测度 [英] How to get precision, recall and f-measure from confusion matrix in Python

查看:1119
本文介绍了如何从Python的混淆矩阵中获取精度,召回率和f测度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python,并且有一些混淆矩阵.我想通过多类分类中的混淆矩阵来计算精度,召回率和f测度.我的结果日志不包含y_truey_pred,仅包含混淆矩阵.

I'm using Python and have some confusion matrixes. I'd like to calculate precisions and recalls and f-measure by confusion matrixes in multiclass classification. My result logs don't contain y_true and y_pred, just contain confusion matrix.

您能告诉我如何从多类分类的混淆矩阵中获得这些分数吗?

Could you tell me how to get these scores from confusion matrix in multiclass classification?

推荐答案

让我们考虑MNIST数据分类(10个类)的情况,其中对于10,000个样本的测试集,我们得到以下混淆矩阵cm(Numpy数组) ):

Let's consider the case of MNIST data classification (10 classes), where for a test set of 10,000 samples we get the following confusion matrix cm (Numpy array):

array([[ 963,    0,    0,    1,    0,    2,   11,    1,    2,    0],
       [   0, 1119,    3,    2,    1,    0,    4,    1,    4,    1],
       [  12,    3,  972,    9,    6,    0,    6,    9,   13,    2],
       [   0,    0,    8,  975,    0,    2,    2,   10,   10,    3],
       [   0,    2,    3,    0,  953,    0,   11,    2,    3,    8],
       [   8,    1,    0,   21,    2,  818,   17,    2,   15,    8],
       [   9,    3,    1,    1,    4,    2,  938,    0,    0,    0],
       [   2,    7,   19,    2,    2,    0,    0,  975,    2,   19],
       [   8,    5,    4,    8,    6,    4,   14,   11,  906,    8],
       [  11,    7,    1,   12,   16,    1,    1,    6,    5,  949]])

为了获得精度&回想一下(每个班级的 ),我们需要计算每个班级的TP,FP和FN.我们不需要TN,但是我们也会对其进行计算,因为它将有助于我们进行健全性检查.

In order to get the precision & recall (per class), we need to compute the TP, FP, and FN per class. We don't need TN, but we will compute it, too, as it will help us for our sanity check.

True Positives只是对角线元素:

The True Positives are simply the diagonal elements:

# numpy should have already been imported as np
TP = np.diag(cm)
TP
# array([ 963, 1119,  972,  975,  953,  818,  938,  975,  906,  949])

误报是各列的总和,减去对角线元素(即TP元素):

The False Positives are the sum of the respective column, minus the diagonal element (i.e. the TP element):

FP = np.sum(cm, axis=0) - TP
FP
# array([50, 28, 39, 56, 37, 11, 66, 42, 54, 49])

类似地,假阴性是相应行的总和,减去对角线(即TP)元素:

Similarly, the False Negatives are the sum of the respective row, minus the diagonal (i.e. TP) element:

FN = np.sum(cm, axis=1) - TP
FN
# array([17, 16, 60, 35, 29, 74, 20, 53, 68, 60])

现在,真正的负面人物有些棘手;让我们首先考虑否定"相对于类0的确切含义:这意味着所有被正确识别为不是0 的样本.因此,基本上我们应该做的是删除相应的行&混淆矩阵中的第一个列,然后将所有剩余元素汇总:

Now, the True Negatives are a little trickier; let's first think what exactly a True Negative means, with respect to, say class 0: it means all the samples that have been correctly identified as not being 0. So, essentially what we should do is remove the corresponding row & column from the confusion matrix, and then sum up all the remaining elements:

num_classes = 10
TN = []
for i in range(num_classes):
    temp = np.delete(cm, i, 0)    # delete ith row
    temp = np.delete(temp, i, 1)  # delete ith column
    TN.append(sum(sum(temp)))
TN
# [8970, 8837, 8929, 8934, 8981, 9097, 8976, 8930, 8972, 8942]

我们进行一次健全性检查:对于每个类,TP,FP,FN和TN的总和必须等于测试集的大小(此处为10,000):让我们确认一下确实是这样:

Let's make a sanity check: for each class, the sum of TP, FP, FN, and TN must be equal to the size of our test set (here 10,000): let's confirm that this is indeed the case:

l = 10000
for i in range(num_classes):
    print(TP[i] + FP[i] + FN[i] + TN[i] == l)

结果是

True
True
True
True
True
True
True
True
True
True

已经计算了这些数量,现在可以很容易地获得精度&.召回每堂课:

Having calculated these quantities, it is now straightforward to get the precision & recall per class:

precision = TP/(TP+FP)
recall = TP/(TP+FN)

在此示例中为

precision
# array([ 0.95064166,  0.97558849,  0.96142433,  0.9456838 ,  0.96262626,
#         0.986731  ,  0.93426295,  0.95870206,  0.94375   ,  0.9509018])

recall
# array([ 0.98265306,  0.98590308,  0.94186047,  0.96534653,  0.97046843,
#         0.91704036,  0.97912317,  0.94844358,  0.9301848 ,  0.94053518])

现在,您应该能够虚拟地为任意大小的混淆矩阵计算这些数量.

You should now be able to compute these quantities virtually for any size of your confusion matrix.

这篇关于如何从Python的混淆矩阵中获取精度,召回率和f测度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆