为什么当我使用 GridSearchCV 和 roc_auc 评分时,grid_search.score(X,y) 和 roc_auc_score(y, y_predict) 的分数不同? [英] Why when I use GridSearchCV with roc_auc scoring, the score is different for grid_search.score(X,y) and roc_auc_score(y, y_predict)?
问题描述
我正在使用分层的 10 折交叉验证来找到从具有最高 auc 的 X(X 有 34 个标签)预测 y(二元结果)的模型.我设置了 GridSearchCV:
I am using stratified 10-fold cross validation to find model that predicts y (binary outcome) from X (X has 34 labels) with the highest auc. I set the GridSearchCV:
log_reg = LogisticRegression()
parameter_grid = {'penalty' : ["l1", "l2"],'C': np.arange(0.1, 3, 0.1),}
cross_validation = StratifiedKFold(n_splits=10,shuffle=True,random_state=100)
grid_search = GridSearchCV(log_reg, param_grid = parameter_grid,scoring='roc_auc',
cv = cross_validation)
然后进行交叉验证:
grid_search.fit(X, y)
y_pr=grid_search.predict(X)
我不明白以下内容:为什么 grid_search.score(X,y)
和 roc_auc_score(y, y_pr)
给出不同的结果(前者是 0.74,后者是 0.63)?为什么这些命令在我的情况下不做同样的事情?
I do not understand the following:
why grid_search.score(X,y)
and roc_auc_score(y, y_pr)
give different results (the former is 0.74 and the latter is 0.63)? Why do not these commands do the same thing in my case?
推荐答案
这是由于在GridSearchCV中使用roc_auc的初始化不同所致.
This is due to different initialization of roc_auc when used in GridSearchCV.
在此处查看 源代码
roc_auc_scorer = make_scorer(roc_auc_score, greater_is_better=True,
needs_threshold=True)
观察第三个参数needs_threshold
.如果为真,它将需要 y_pred
的连续值,例如概率或置信度分数,这些值在 gridsearch 中将根据 log_reg.decision_function()
计算.
Observe the third parameter needs_threshold
. When true, it will require the continous values for y_pred
such as probabilities or confidence scores which in gridsearch will be calculated from log_reg.decision_function()
.
当您使用 y_pr
显式调用 roc_auc_score
时,您使用的是 .predict()
,它将输出数据的结果预测类标签而不是概率.这应该说明差异.
When you explicitly call roc_auc_score
with y_pr
, you are using .predict()
which will output the resultant predicted class labels of the data and not probabilities. That should account for the difference.
试试:
y_pr=grid_search.decision_function(X)
roc_auc_score(y, y_pr)
如果结果还是不一样,请用完整的代码和一些示例数据更新问题.
If still not same results, please update the question with complete code and some sample data.
这篇关于为什么当我使用 GridSearchCV 和 roc_auc 评分时,grid_search.score(X,y) 和 roc_auc_score(y, y_predict) 的分数不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!