Scikit-learn : roc_auc_score [英] Scikit-learn : roc_auc_score

查看:51
本文介绍了Scikit-learn : roc_auc_score的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 scikit-learn 的 roc_auc_score 函数来评估我的模型性能.但是,无论我使用 predict() 还是 predict_proba() ,我都会得到不同的值

I am using the roc_auc_score function from scikit-learn to evaluate my model performances. Howver, I get differents values whether I use predict() or predict_proba()

p_pred = forest.predict_proba(x_test)
y_test_predicted= forest.predict(x_test)
fpr, tpr, _ = roc_curve(y_test, p_pred[:, 1])
roc_auc = auc(fpr, tpr)

roc_auc_score(y_test,y_test_predicted) # = 0.68
roc_auc_score(y_test, p_pred[:, 1])    # = 0.93

可以就此提出建议吗?

提前致谢

推荐答案

首先看一下 predict 和 predict_proba 的区别.前者预测特征集的类别,而后者预测各种类别的概率.

First look at the difference between predict and predict_proba. The former predicts the class for the feature set where as the latter predicts the probabilities of various classes.

您看到了隐含在 y_test_predicted 二进制格式中的舍入误差的影响.y_test_predicted 由 1 和 0 组成,而 p_pred 由 0 和 1 之间的浮点值组成.roc_auc_score 例程改变阈值并生成真阳性率和假阳性率,因此分数看起来很不一样.

You are seeing the effect of rounding error that is implicit in the binary format of y_test_predicted. y_test_predicted is comprised of 1's and 0's where as p_pred is comprised of floating point values between 0 and 1. The roc_auc_score routine varies the threshold value and generates the true positive rate and false positive rate, so the score looks quite different.

考虑以下情况:

y_test           = [ 1, 0, 0, 1, 0, 1, 1]
p_pred           = [.6,.4,.6,.9,.2,.7,.4]
y_test_predicted = [ 1, 0, 1, 1, 0, 1, 0]

请注意,ROC 曲线是通过考虑所有截止阈值生成的.现在考虑 0.65 的阈值...

Note that the ROC curve is generated by considering all cutoff thresholds. Now consider a threshold of 0.65...

p_pred 案例给出:

The p_pred case gives:

TPR=0.5, FPR=0, 

和 y_test_predicted 案例给出:

and the y_test_predicted case gives:

TPR=.75 FPR=.25.  

你大概可以看到,如果这两个点不同,那么两条曲线下的面积也会大不相同.

You can probably see that if these two points are different, then the area under the two curves will be quite different too.

但要真正理解它,我建议查看 ROC 曲线本身以帮助理解这种差异.

But to really understand it, I suggest looking at the ROC curves themselves to help understand this difference.

希望这有帮助!

这篇关于Scikit-learn : roc_auc_score的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆