Scikit学习:roc_auc_score [英] Scikit-learn : roc_auc_score

查看:42
本文介绍了Scikit学习:roc_auc_score的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用scikit-learn的roc_auc_score函数来评估我的模型性能.但是,无论使用predict()还是predict_proba(),我都会得到不同的值

I am using the roc_auc_score function from scikit-learn to evaluate my model performances. Howver, I get differents values whether I use predict() or predict_proba()

p_pred = forest.predict_proba(x_test)
y_test_predicted= forest.predict(x_test)
fpr, tpr, _ = roc_curve(y_test, p_pred[:, 1])
roc_auc = auc(fpr, tpr)

roc_auc_score(y_test,y_test_predicted) # = 0.68
roc_auc_score(y_test, p_pred[:, 1])    # = 0.93

请提供建议吗?

预先感谢

推荐答案

首先看一下预报和predict_proba之间的区别.前者预测特征集的类别,而后者则预测各种类别的概率.

First look at the difference between predict and predict_proba. The former predicts the class for the feature set where as the latter predicts the probabilities of various classes.

您将看到舍入误差的影响,该误差隐含在y_test_predicted的二进制格式中.y_test_predicted由1和0组成,而p_pred由介于0和1之间的浮点值组成.roc_auc_score例程会更改阈值并生成真阳性率和假阳性率,因此分数看起来大不相同.

You are seeing the effect of rounding error that is implicit in the binary format of y_test_predicted. y_test_predicted is comprised of 1's and 0's where as p_pred is comprised of floating point values between 0 and 1. The roc_auc_score routine varies the threshold value and generates the true positive rate and false positive rate, so the score looks quite different.

考虑以下情况:

y_test           = [ 1, 0, 0, 1, 0, 1, 1]
p_pred           = [.6,.4,.6,.9,.2,.7,.4]
y_test_predicted = [ 1, 0, 1, 1, 0, 1, 0]

请注意,ROC曲线是通过考虑所有截止阈值而生成的.现在考虑0.65的阈值...

Note that the ROC curve is generated by considering all cutoff thresholds. Now consider a threshold of 0.65...

p_pred情况给出:

The p_pred case gives:

TPR=0.5, FPR=0, 

和y_test_predicted情况给出:

and the y_test_predicted case gives:

TPR=.75 FPR=.25.  

您可能会看到,如果这两个点不同,则两条曲线下的面积也将完全不同.

You can probably see that if these two points are different, then the area under the two curves will be quite different too.

但是要真正理解它,我建议自己查看ROC曲线以帮助理解这种差异.

But to really understand it, I suggest looking at the ROC curves themselves to help understand this difference.

希望这会有所帮助!

这篇关于Scikit学习:roc_auc_score的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆