在GridSearchCV和RandomizedSearchCV中获取单个模型和自定义分数 [英] Get individual models and customized score in GridSearchCV and RandomizedSearchCV

查看:344
本文介绍了在GridSearchCV和RandomizedSearchCV中获取单个模型和自定义分数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

GridSearchCV RandomizedSearchCV 具有 best_estimator _ : / p>


  • 仅返回最佳估算器/模型

  • 通过一种简单的评分方法找到最佳估算器:准确性,召回率,精度等。

  • 仅基于训练集进行评估


我想通过



  • 我自己的评分方法定义

  • 进一步评估测试集,而不是像GridSearchCV那样进行训练。最终,至关重要的是测试仪的性能。训练集倾向于在Grid Search上提供几乎完美的准确性。


我当时想通过以下方式实现这一目标:



  • 在GridSearchCV和RandomizedSearchCV中获取单个估计器/模型

  • 对于每个估计器/模型,对测试集进行预测并使用我的定制分数进行评估


我的问题是:



  • 有没有办法从<$ c $获取所有单个模型c> GridSearchCV 吗?

  • 如果没有,您如何实现与我想要的东西相同的想法?最初,我想利用现有的 GridSearchCV ,因为它可以自动处理多参数网格,CV和多线程。欢迎获得实现类似结果的其他任何建议。


解决方案

您可以使用自定义评分方法 XYZSearchCV s中已经有:请参阅 scoring 参数和文档的《用户指南》链接,以了解如何编写自定义记分器。


您可以使用固定的训练/验证拆分来评估超参数(请参阅 cv 参数),但这将是比k折交叉验证更不可靠。应保留测试集,以便仅对最终模型评分。如果您使用它选择超参数,那么收到的分数将不会是对未来性能的无偏估计。


没有简单的方法来检索由 GridSearchCV 。 (通常是很多模型,保存它们通常会浪费内存。)


<$的并行化和参数网格部分c $ c> GridSearchCV 非常简单;如果需要,可以复制源代码的相关部分以产生自己的方法。





培训集往往会在我的网格搜索上提供几乎完美的准确性。


这有点令人惊讶,因为 CV 部分搜索意味着正在根据看不见的数据对模型进行评分。如果您获得的 best_score _ 很高,但是测试集的性能却很低,那么我怀疑您的训练集实际上不是代表性的样本,那将需要更多的细微差别了解情况。


GridSearchCV and RandomizedSearchCV has best_estimator_ that :

  • Returns only the best estimator/model
  • Find the best estimator via one of the simple scoring methods : accuracy, recall, precision, etc.
  • Evaluate based on training sets only

I would like to enrich those limitations with

  • My own definition of scoring methods
  • Evaluate further on test set rather than training as done by GridSearchCV. Eventually it's the test set performance that counts. Training set tends to give almost perfect accuracy on my Grid Search.

I was thinking of achieving it by :

  • Get the individual estimators/models in GridSearchCV and RandomizedSearchCV
  • With every estimator/model, predict on test set and evaluate with my customized score

My question is:

  • Is there a way to get all individual models from GridSearchCV ?
  • If not, what is your thought to achieve the same thing as what I wanted ? Initially I wanted to exploit existing GridSearchCV because it handles automatically multiple parameter grid, CV and multi-threading. Any other recommendation to achieve the similar result is welcome.

解决方案

You can use custom scoring methods already in the XYZSearchCVs: see the scoring parameter and the documentation's links to the User Guide for how to write a custom scorer.

You can use a fixed train/validation split to evaluate the hyperparameters (see the cv parameter), but this will be less robust than a k-fold cross-validation. The test set should be reserved for scoring only the final model; if you use it to select hyperparameters, then the scores you receive will not be unbiased estimates of future performance.

There is no easy way to retrieve all the models built by GridSearchCV. (It would generally be a lot of models, and saving them all would generally be a waste of memory.)

The parallelization and parameter grid parts of GridSearchCV are surprisingly simple; if you need to, you can copy out the relevant parts of the source code to produce your own approach.


Training set tends to give almost perfect accuracy on my Grid Search.

That's a bit surprising, since the CV part of the searches means the models are being scored on unseen data. If you get very high best_score_ but low performance on the test set, then I would suspect your training set is not actually a representative sample, and that'll require a much more nuanced understanding of the situation.

这篇关于在GridSearchCV和RandomizedSearchCV中获取单个模型和自定义分数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆