绘制多个类的 ROC 曲线 [英] Plotting ROC Curve with Multiple Classes

查看:112
本文介绍了绘制多个类的 ROC 曲线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在关注此链接上为多个类绘制 ROC 曲线的文档:http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

我特别对这条线感到困惑:

y_score =classifier.fit(X_train, y_train).decision_function(X_test)

我已经看到在其他示例中,y_score 包含概率,并且正如我们所期望的那样,它们都是正值.但是,此示例中的 y_score(A-C 类的每一列)大部分为负值.有趣的是,它们加起来仍然是-1:

输入:y_score[0:5,:]出:数组([[-0.76305896,-0.36472635,0.1239796],[-0.20238399, -0.63148982, -0.16616656],[ 0.11808492, -0.80262259, -0.32062486],[-0.90750303, -0.1239792, 0.02184016],[-0.01108555, -0.27918155, -0.71882525]])

我该如何解释?以及如何仅从 y_score 判断模型对每个输入的预测哪个类?

所有相关代码:

将 numpy 导入为 np导入 matplotlib.pyplot 作为 plt从 itertools 导入循环从 sklearn 导入支持向量机,数据集从 sklearn.metrics 导入 roc_curve, auc从 sklearn.model_selection 导入 train_test_split从 sklearn.preprocessing 导入 label_binarize从 sklearn.multiclass 导入 OneVsRestClassifier从 scipy 导入 interp# 导入一些数据来玩虹膜 = datasets.load_iris()X = 虹膜数据y = iris.target# 对输出进行二值化y = label_binarize(y, classes=[0, 1, 2])n_classes = y.shape[1]# 添加噪声特征使问题更难random_state = np.random.RandomState(0)n_samples, n_features = X.shapeX = np.c_[X, random_state.randn(n_samples, 200 * n_features)]# 混洗和拆分训练和测试集X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,random_state=0)# 学习预测每个类别与另一个类别分类器 = OneVsRestClassifier(svm.SVC(kernel='linear',概率=真,random_state=random_state))y_score = 分类器.fit(X_train, y_train).decision_function(X_test)

解决方案

decision_function 返回样本与每个类的决策边界的距离.应该不是这个概率.如果你想找出概率,你可以使用 predict_proba 方法.如果您想了解估算器为样本分配的类别,请使用 预测.

from sklearn import svm, datasets从 sklearn.model_selection 导入 train_test_split从 sklearn.preprocessing 导入 label_binarize从 sklearn.multiclass 导入 OneVsRestClassifier# 导入一些数据来玩虹膜 = datasets.load_iris()X = 虹膜数据y = iris.target# 对输出进行二值化y = label_binarize(y, classes=[0, 1, 2])n_classes = y.shape[1]# 添加噪声特征使问题更难random_state = np.random.RandomState(0)n_samples, n_features = X.shapeX = np.c_[X, random_state.randn(n_samples, 200 * n_features)]# 混洗和拆分训练和测试集X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,random_state=0)# 学习预测每个类别与另一个类别分类器 = OneVsRestClassifier(svm.SVC(kernel='linear',概率=真,random_state=random_state))# 训练分类器分类器.fit(X_train,y_train)# 生成 y_scorey_score = 分类器.decision_function(X_test)# 生成概率y_prob = 分类器.predict_proba(X_test)# 生成预测y_pred = 分类器.预测(X_test)

结果:

<预><代码>>>>y_score[0:5,:]数组([[-0.76305896, -0.36472635, 0.1239796],[-0.20238399, -0.63148982, -0.16616656],[ 0.11808492, -0.80262259, -0.32062486],[-0.90750303, -0.1239792, 0.02184016],[-0.01108555, -0.27918155, -0.71882525]])>>>y_prob[0:5,:]数组([[0.06019732, 0.24174159, 0.8293423],[0.35610687, 0.30121076, 0.46392587],[0.65735935, 0.34605074, 0.25675446],[0.03458982, 0.19539083, 0.72575167],[0.53656981, 0.22445759, 0.03221816]])>>>y_pred[0:5,:]数组([[0, 0, 1],[0, 0, 0],[1, 0, 0],[0, 0, 1],[0, 0, 0]])

I am following the documentation for plotting ROC curves for multiple classes at this link: http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

I am confused about this line in particular:

y_score = classifier.fit(X_train, y_train).decision_function(X_test)

I've seen that in other examples, y_score holds probabilities, and they are all positive values, as we would expect. However, the y_score (each column for classes A-C) in this example has mostly negative values. Interestingly, they still add up to -1:

In: y_score[0:5,:]
Out: array([[-0.76305896, -0.36472635,  0.1239796 ],
            [-0.20238399, -0.63148982, -0.16616656],
            [ 0.11808492, -0.80262259, -0.32062486],
            [-0.90750303, -0.1239792 ,  0.02184016],
            [-0.01108555, -0.27918155, -0.71882525]])

How am I supposed to interpret this? And how can I tell just from the y_score which class is the model's prediction for each input?

Edit: all the relevant code:

import numpy as np
import matplotlib.pyplot as plt
from itertools import cycle

from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from scipy import interp

# Import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Binarize the output
y = label_binarize(y, classes=[0, 1, 2])
n_classes = y.shape[1]

# Add noisy features to make the problem harder
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]

# shuffle and split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
                                                    random_state=0)

# Learn to predict each class against the other
classifier = OneVsRestClassifier(svm.SVC(kernel='linear', 
                                 probability=True,
                                 random_state=random_state))
y_score = classifier.fit(X_train, y_train).decision_function(X_test)

解决方案

The decision_function returns the distance of the sample from the decision boundary of each class. It wouldn't be the probability. If you want to find out probability, you would use the predict_proba method. If you want to find out what class the estimator assigns the sample, then use predict.

from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier

# Import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Binarize the output
y = label_binarize(y, classes=[0, 1, 2])
n_classes = y.shape[1]

# Add noisy features to make the problem harder
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]

# shuffle and split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
                                                    random_state=0)

# Learn to predict each class against the other
classifier = OneVsRestClassifier(svm.SVC(kernel='linear', 
                                 probability=True,
                                 random_state=random_state))

# train the classifier
classifer.fit(X_train, y_train)

# generate y_score
y_score = classifier.decision_function(X_test)

# generate probabilities
y_prob = classifier.predict_proba(X_test)

# generate predictions
y_pred = classifier.predict(X_test)

Result:

>>> y_score[0:5,:]
array([[-0.76305896, -0.36472635,  0.1239796 ],
       [-0.20238399, -0.63148982, -0.16616656],
       [ 0.11808492, -0.80262259, -0.32062486],
       [-0.90750303, -0.1239792 ,  0.02184016],
       [-0.01108555, -0.27918155, -0.71882525]])
>>> y_prob[0:5,:]
array([[0.06019732, 0.24174159, 0.8293423 ],
       [0.35610687, 0.30121076, 0.46392587],
       [0.65735935, 0.34605074, 0.25675446],
       [0.03458982, 0.19539083, 0.72575167],
       [0.53656981, 0.22445759, 0.03221816]])
>>> y_pred[0:5,:]
array([[0, 0, 1],
       [0, 0, 0],
       [1, 0, 0],
       [0, 0, 1],
       [0, 0, 0]])

这篇关于绘制多个类的 ROC 曲线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆