计算ROC曲线,分类报告和混淆矩阵以解决多标签分类问题 [英] Calculate ROC curve, classification report and confusion matrix for multilabel classification problem

查看:1183
本文介绍了计算ROC曲线,分类报告和混淆矩阵以解决多标签分类问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解如何为我的多标签分类问题制作一个混淆矩阵和ROC曲线.我正在建立一个神经网络. 这是我的课程:

I am trying to understand how to make a confusion matrix and ROC curve for my multilabel classification problem. I am building a neural network. Here are my classes:

mlb = MultiLabelBinarizer()
ohe = mlb.fit_transform(as_list)
# loop over each of the possible class labels and show them
for (i, label) in enumerate(mlb.classes_):
    print("{}. {}".format(i + 1, label))

[INFO] class labels:
1. class1
2. class2
3. class3
4. class4
5. class5
6. class6

我的标签已转换:

ohe
array([[0, 1, 0, 0, 1, 1],
       [0, 1, 1, 1, 1, 0],
       [1, 1, 1, 0, 1, 0],
       [0, 1, 1, 1, 0, 1],...]]

训练数据:

array([[[[ 1.93965047e+04,  8.49532852e-01],
         [ 1.93965047e+04,  8.49463479e-01],
         [ 1.93965047e+04,  8.49474722e-01],
         ...,

型号:

model.compile(loss="binary_crossentropy", optimizer=opt,metrics=["accuracy"])
H = model.fit(trainX, trainY, batch_size=BS,
    validation_data=(testX, testY),
    epochs=EPOCHS, verbose=1)

我能够获得学位,但是我对如何计算混淆矩阵或ROC曲线或获得分类报告一无所知. 这是百分比:

I am able to get precentages but I am a bit clueless in how to calculate confusion matrix or ROC curve, or get classification report.. here are the precentages:

proba = model.predict(testX)
idxs = np.argsort(proba)[::-1][:2]

for i in proba:
    print ('\n')
    for (label, p) in zip(mlb.classes_, i):
        print("{}: {:.2f}%".format(label, p * 100))

class1: 69.41%
class2: 76.41%
class3: 58.02%
class4: 63.97%
class5: 48.91%
class6: 58.28%

class1: 69.37%
class2: 76.42%
class3: 58.01%
class4: 63.92%
class5: 48.88%
class6: 58.26%

如果有人对如何操作有一些建议,或者举个例子,我将不胜感激!预先谢谢你!

If anyone has some tips on how to do it or an example I would really appreciate it! Thank you in advance!

推荐答案

从v0.21开始,scikit-learn包含一个多标签混淆矩阵.改编自 docs 中的示例,用于6个类:

From v0.21 onwards, scikit-learn includes a multilabel confusion matrix; adapting the example from the docs for 6 classes:

import numpy as np
from sklearn.metrics import multilabel_confusion_matrix
y_true = np.array([[1, 0, 1, 0, 0],
                   [0, 1, 0, 1, 1],
                   [1, 1, 1, 0, 1]])
y_pred = np.array([[1, 0, 0, 0, 1],
                   [0, 1, 1, 1, 0],
                   [1, 1, 1, 0, 0]])

multilabel_confusion_matrix(y_true, y_pred)
# result:
array([[[1, 0],
        [0, 2]],

       [[1, 0],
        [0, 2]],

       [[0, 1],
        [1, 1]],

       [[2, 0],
        [0, 1]],

       [[0, 1],
        [2, 0]]])

通常的classification_report也可以正常工作:

The usual classification_report also works fine:

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))
# result
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         2
           1       1.00      1.00      1.00         2
           2       0.50      0.50      0.50         2
           3       1.00      1.00      1.00         1
           4       0.00      0.00      0.00         2

   micro avg       0.75      0.67      0.71         9
   macro avg       0.70      0.70      0.70         9
weighted avg       0.67      0.67      0.67         9
 samples avg       0.72      0.64      0.67         9

关于ROC,您可以从在文档中为多标签问题示例ROC曲线(虽然不确定该概念本身是否非常有用).

Regarding ROC, you can take some ideas from the Plot ROC curves for the multilabel problem example in the docs (not quite sure the concept itself is very useful though).

混淆矩阵和分类报告需要进行硬类预测(如示例中所示); ROC要求将预测作为概率.

Confusion matrix and classification report require hard class predictions (as in the example); ROC requires the predictions as probabilities.

要将概率预测转换为困难类别,您需要一个阈值.现在,通常(隐式地)将这个阈值设为0.5,即如果y_pred > 0.5则预测1,否则预测0.尽管如此,并不一定总是如此,它取决于特定的问题.一旦设置了这样的阈值,您就可以通过列表理解轻松地将概率预测转换为硬类.这是一个简单的示例:

To convert your probabilistic predictions to hard classes, you need a threshold. Now, usually (and implicitly), this threshold is taken to be 0.5, i.e. predict 1 if y_pred > 0.5, else predict 0. Nevertheless, this is not necessarily the case always, and it depends on the particular problem. Once you have set such a threshold, you can easily convert your probabilistic predictions to hard classes with a list comprehension; here is a simple example:

import numpy as np

y_prob = np.array([[0.9, 0.05, 0.12, 0.23, 0.78],
                   [0.11, 0.81, 0.51, 0.63, 0.34],
                   [0.68, 0.89, 0.76, 0.43, 0.27]])

thresh = 0.5

y_pred = np.array([[1 if i > thresh else 0 for i in j] for j in y_prob])

y_pred
# result:
array([[1, 0, 0, 0, 1],
       [0, 1, 1, 1, 0],
       [1, 1, 1, 0, 0]])

这篇关于计算ROC曲线,分类报告和混淆矩阵以解决多标签分类问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆