将交叉验证和AUC-ROC用于sklearn中的逻辑回归模型 [英] Using cross validation and AUC-ROC for a logistic regression model in sklearn

查看:157
本文介绍了将交叉验证和AUC-ROC用于sklearn中的逻辑回归模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用sklearn包来构建逻辑回归模型,然后对其进行评估.具体来说,我想使用交叉验证来执行此操作,但是无法找到使用cross_val_score函数执行此操作的正确方法.

I'm using the sklearn package to build a logistic regression model and then evaluate it. Specifically, I want to do so using cross validation, but can't figure out the right way to do so with the cross_val_score function.

根据文档和一些<我看到的是href ="https://stackoverflow.com/questions/39163354/evaluating-logistic-regression-with-cross-validation">示例,我需要将函数传递给模型,功能,结果和评分方法.但是,AUC不需要预测,它需要概率,因此它可以尝试不同的阈值并基于该阈值计算ROC曲线.那么什么是正确的方法呢?该函数将'roc_auc'作为可能的计分方法,因此我假设它与它兼容,我只是不确定使用它的正确方法.下面的示例代码段.

According to the documentation and some examples I saw, I need to pass the function the model, the features, the outcome, and a scoring method. However, the AUC doesn't need predictions, it needs probabilities, so it can try different threshold values and calculate the ROC curve based on that. So what's the right approach here? This function has 'roc_auc' as a possible scoring method, so I'm assuming it's compatible with it, I'm just not sure about the right way to use it. Sample code snippet below.

from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import cross_val_score

features = ['a', 'b', 'c']
outcome = ['d']
X = df[features]
y = df[outcome]
crossval_scores = cross_val_score(LogisticRegression(), X, y, scoring='roc_auc', cv=10)

基本上,我不明白为什么我需要在这里将y传递给我的cross_val_score函数,而不是在逻辑回归模型中使用X计算的概率.它只是自己做那一部分吗?

Basically, I don't understand why I need to pass y to my cross_val_score function here, instead of probabilities calculated using X in a logistic regression model. Does it just do that part on its own?

推荐答案

所有监督学习方法(包括逻辑回归)都需要真实的y值来拟合模型.

All supervised learning methods (including logistic regression) need the true y values to fit a model.

拟合模型后,我们通常希望:

After fitting a model, we generally want to:

  • 做出预测,
  • 为这些预测评分(通常使用保留"数据,例如使用交叉验证)

cross_val_score为您提供模型预测的交叉验证得分.但是要对预测进行评分,首先需要进行预测,并且首先需要使预测适合模型,这需要同时使用X和(true)y.

cross_val_score gives you cross-validated scores of a model's predictions. But to score the predictions it first needs to make the predictions, and to make the predictions it first needs to fit the model, which requires both X and (true) y.

cross_val_score接受不同的评分指标.因此,例如,如果选择f1-score,则在cross-val-score期间生成的模型预测将是类预测(来自模型的predict()方法).而且,如果您选择roc_auc作为指标,则用于对模型评分的模型预测将是概率预测(来自模型的predict_proba()方法).

cross_val_score as you note accepts different scoring metrics. So if you chose f1-score for example, the model predictions generated during cross-val-score would be class predictions (from the model's predict() method). And if you chose roc_auc as your metric, the model predictions used to score the model would be probability predictions (from the model's predict_proba() method).

这篇关于将交叉验证和AUC-ROC用于sklearn中的逻辑回归模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆