具有scoring ='roc_auc'的cross_val_score和roc_auc_score有什么区别? [英] What is the difference between cross_val_score with scoring='roc_auc' and roc_auc_score?

查看:1128
本文介绍了具有scoring ='roc_auc'的cross_val_score和roc_auc_score有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对cross_val_score评分指标"roc_auc"和roc_auc_score之间的区别感到困惑,我可以直接导入并直接调用.

I am confused about the difference between the cross_val_score scoring metric 'roc_auc' and the roc_auc_score that I can just import and call directly.

文档( http://scikit-learn.org/stable/modules/model_evaluation. html#scoring-parameter )表示指定scoring ='roc_auc'将使用sklearn.metrics.roc_auc_score.但是,当我使用scoring ='roc_auc'实现GridSearchCV或cross_val_score时,我收到的数字与直接调用roc_auc_score时截然不同.

The documentation (http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter) indicates that specifying scoring='roc_auc' will use the sklearn.metrics.roc_auc_score. However, when I implement GridSearchCV or cross_val_score with scoring='roc_auc' I receive very different numbers that when I call roc_auc_score directly.

这是我的代码,以帮助演示我所看到的内容:

Here is my code to help demonstrate what I see:

# score the model using cross_val_score

rf = RandomForestClassifier(n_estimators=150,
                            min_samples_leaf=4,
                            min_samples_split=3,
                            n_jobs=-1)

scores = cross_val_score(rf, X, y, cv=3, scoring='roc_auc')

print scores
array([ 0.9649023 ,  0.96242235,  0.9503313 ])

# do a train_test_split, fit the model, and score with roc_auc_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
rf.fit(X_train, y_train)

print roc_auc_score(y_test, rf.predict(X_test))
0.84634039111363313 # quite a bit different than the scores above!

我觉得我在这里遗漏了一些非常简单的东西-很可能是我在实施/解释一项得分指标时出现的错误.

I feel like I am missing something very simple here -- most likely a mistake in how I am implementing/interpreting one of the scoring metrics.

有人能阐明两个得分指标之间差异的原因吗?

Can anyone shed any light on the reason for the discrepancy between the two scoring metrics?

推荐答案

这是因为您提供的是预测的y,而不是roc_auc_score中的概率.此功能获得分数,而不是分类标签.尝试执行以下操作:

This is because you supplied predicted y's instead of the probability in roc_auc_score. This function takes a score, not the classified label. Try instead to do this:

print roc_auc_score(y_test, rf.predict_proba(X_test)[:,1])

应该给出与cross_val_score先前的结果相似的结果. 有关详细信息,请参阅此帖子.

It should give a similar result to previous result from cross_val_score. Refer to this post for more info.

这篇关于具有scoring ='roc_auc'的cross_val_score和roc_auc_score有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆