Scikit学习:在GridSearchCV中评分 [英] Scikit-learn: scoring in GridSearchCV

查看：121 发布时间：2020/5/4 9:18:33 machine-learning cross-validation optimization scikit-learn

本文介绍了Scikit学习:在GridSearchCV中评分的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

似乎scikit-learn的GridSearchCV收集了其(内部)交叉验证折叠的得分，然后平均所有折叠的得分.我想知道背后的理由.乍一看，似乎更灵活地是收集交叉验证折叠的预测，然后将选定的评分标准应用于所有折叠的预测.

It seems that GridSearchCV of scikit-learn collects the scores of its (inner) cross-validation folds and then averages across the scores of all folds. I was wondering about the rationale behind this. At first glance, it would seem more flexible to instead collect the predictions of its cross-validation folds and then apply the chosen scoring metric to the predictions of all folds.

我偶然发现的原因是我在cv=LeaveOneOut()和scoring='balanced_accuracy'(scikit-learn v0.20.dev0)的不平衡数据集上使用了GridSearchCV.对每个遗漏的样本应用评分指标(例如平衡的准确性(或查全率))是没有意义的.相反，我想先收集所有预测，然后将我的得分指标一次应用于所有预测.还是这涉及推理错误?

The reason I stumbled upon this is that I use GridSearchCV on an imbalanced data set with cv=LeaveOneOut() and scoring='balanced_accuracy' (scikit-learn v0.20.dev0). It doesn't make sense to apply a scoring metric such as balanced accuracy (or recall) to each left-out sample. Rather, I would want to collect all predictions first and then apply my scoring metric once to all predictions. Or does this involve an error in reasoning?

更新:我通过基于GridSearchCV创建自定义网格搜索类来解决此问题，区别在于首先从所有内部折叠中收集预测，并且一次应用评分标准.

Update: I solved it by creating a custom grid search class based on GridSearchCV with the difference that predictions are first collected from all inner folds and the scoring metric is applied once.

Scikit学习:在GridSearchCV中评分 [英] Scikit-learn: scoring in GridSearchCV

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

Scikit学习:在GridSearchCV中评分 [英] Scikit-learn: scoring in GridSearchCV

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭