scikit-learn 中使用 LeaveOneOut 的 roc_auc 评分方法 [英] roc_auc score method with LeaveOneOut in scikit-learn

查看:31
本文介绍了scikit-learn 中使用 LeaveOneOut 的 roc_auc 评分方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 scikit-learn 中,GridSearchCV() 支持 'roc_auc' 作为评分函数.它适用于 n 折交叉验证,但如果我使用 LeaveOneOut,它不起作用并生成错误消息.

In scikit-learn, GridSearchCV() support 'roc_auc' as a scoring function. It works well with n-fold cross-validation, but if I use LeaveOneOut, it does not work and generate error message.

ValueError: Only one class present in Y. ROC AUC score is not defined in that case.

虽然使用 AUC 仅使用一个样本进行绘图似乎很自然,但其他语言(例如 R)支持 roc_auc 用于 LeaveOneOut.

Although it seems natural that drawing with AUC with only one sample is not possible, other language such as R supports roc_auc for LeaveOneOut.

如何使用 python 和 scikit-learn 进行计算?如果不可能,使用大倍数交叉验证的结果会不会像这样?

How can I calculate with python and scikit-learn? If it is impossible, will using large-fold cross validation result like it?

推荐答案

正如 David Maust 所指出的,留一法交叉验证的问题是 GridSearchCV 计算每个折叠的分数,然后报告平均值.

As pointed out by David Maust, the problem with leave one out cross validation is GridSearchCV calculates the score over each fold and then reports the average.

为了使用 LeaveOneOut 获得有意义的 ROC AUC,您需要计算每个折叠的概率估计值(每个折叠仅包含一个观察值),然后计算所有这些概率估计值的集合上的 ROC AUC.

In order to obtain a meaningful ROC AUC with LeaveOneOut, you need to calculate probability estimates for each fold (each consisting of just one observation), then calculate the ROC AUC on the set of all these probability estimates.

这可以按如下方式完成:

This can be done as follows:

def LeaveOneOut_predict_proba(clf, X, y, i):
    clf.fit(X.drop(i), y.drop(i))
    return clf.predict_proba(X.loc[[i]])[0, 1]

# set clf, param_grid, X, y

for params in ParameterGrid(param_grid):
    print(params)
    clf.set_params(**params)
    y_proba = [LeaveOneOut_predict_proba(clf, X, y, i) for i in X.index]
    print(roc_auc_score(y, y_proba))

示例输出:

{'n_neighbors': 5, 'p': 1, 'weights': 'uniform'}
0.6057986111111112
{'n_neighbors': 5, 'p': 1, 'weights': 'distance'}
0.620625
{'n_neighbors': 5, 'p': 2, 'weights': 'uniform'}
0.5862499999999999

由于这不使用 GridSearchCV 的基础结构,因此您需要自己实现选择最大分数和并行化(如有必要).

Since this does not use the infrastructure of GridSearchCV, you will need to implement picking the maximal score and parallelization (if necessary) yourself.

这篇关于scikit-learn 中使用 LeaveOneOut 的 roc_auc 评分方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆