为什么当我在roc_auc评分中使用GridSearchCV时，grid_search.score（X，y）和roc_auc_score（y，y_predict）的分数不同？ [英] Why when I use GridSearchCV with roc_auc scoring, the score is different for grid_search.score(X,y) and roc_auc_score(y, y_predict)?

查看：454 发布时间：2020/10/11 19:52:11 python scikit-learn cross-validation auc

本文介绍了为什么当我在roc_auc评分中使用GridSearchCV时，grid_search.score（X，y）和roc_auc_score（y，y_predict）的分数不同？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用分层10倍交叉验证来查找模型，该模型可预测具有最高auc的X（X具有34个标记）的y（二进制结果）。我设置了GridSearchCV：

I am using stratified 10-fold cross validation to find model that predicts y (binary outcome) from X (X has 34 labels) with the highest auc. I set the GridSearchCV:

log_reg = LogisticRegression()
parameter_grid = {'penalty' : ["l1", "l2"],'C': np.arange(0.1, 3, 0.1),}
cross_validation = StratifiedKFold(n_splits=10,shuffle=True,random_state=100)
grid_search = GridSearchCV(log_reg, param_grid = parameter_grid,scoring='roc_auc',
                          cv = cross_validation)

进行交叉验证：

grid_search.fit(X, y)
y_pr=grid_search.predict(X)

我不理解以下内容：
为什么 grid_search.score （X，y）和 roc_auc_score（y，y_pr）给出不同的结果（前者为0.74，后者为0.63）？为什么在我的情况下这些命令不做相同的事情？

I do not understand the following: why grid_search.score(X,y) and roc_auc_score(y, y_pr) give different results (the former is 0.74 and the latter is 0.63)? Why do not these commands do the same thing in my case?

推荐答案

这是由于在GridSearchCV中使用roc_auc的初始化不同

This is due to different initialization of roc_auc when used in GridSearchCV.

请查看此处的源代码

roc_auc_scorer = make_scorer(roc_auc_score, greater_is_better=True,
                             needs_threshold=True)

观察第三个参数 needs_threshold 。设置为true时，它将要求 y_pred 的连续值，例如概率或置信度得分，在gridsearch中将根据 log_reg.decision_function（）。

Observe the third parameter needs_threshold. When true, it will require the continous values for y_pred such as probabilities or confidence scores which in gridsearch will be calculated from log_reg.decision_function().

当您使用 y_pr roc_auc_score 时c $ c>，您使用的是 .predict（），它将输出数据的最终预测类标签，而不是概率。

When you explicitly call roc_auc_score with y_pr, you are using .predict() which will output the resultant predicted class labels of the data and not probabilities. That should account for the difference.

尝试：

y_pr=grid_search.decision_function(X)
roc_auc_score(y, y_pr)

如果结果仍然不同，请使用完整的代码和一些示例数据来更新问题。

If still not same results, please update the question with complete code and some sample data.

这篇关于为什么当我在roc_auc评分中使用GridSearchCV时，grid_search.score（X，y）和roc_auc_score（y，y_predict）的分数不同？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么当我在roc_auc评分中使用GridSearchCV时，grid_search.score（X，y）和roc_auc_score（y，y_predict）的分数不同？ [英] Why when I use GridSearchCV with roc_auc scoring, the score is different for grid_search.score(X,y) and roc_auc_score(y, y_predict)?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么当我在roc_auc评分中使用GridSearchCV时，grid_search.score（X，y）和roc_auc_score（y，y_predict）的分数不同？ [英] Why when I use GridSearchCV with roc_auc scoring, the score is different for grid_search.score(X,y) and roc_auc_score(y, y_predict)?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭