如何从 sklearn GridSearchCV 获取 MSE 和 R2? [英] How to get both MSE and R2 from a sklearn GridSearchCV?

查看:20
本文介绍了如何从 sklearn GridSearchCV 获取 MSE 和 R2?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以在管道上使用 GridSearchCV 并将评分指定为 'MSE''R2'.然后我可以访问 gridsearchcv._best_score 来恢复我指定的那个.如何获得 GridSearchCV 找到的解决方案的其他分数?

I can use a GridSearchCV on a pipeline and specify scoring to either be 'MSE' or 'R2'. I can then access gridsearchcv._best_score to recover the one I specified. How do I also get the other score for the solution found by GridSearchCV?

如果我使用另一个评分参数再次运行 GridSearchCV,它可能找不到相同的解决方案,因此它报告的分数可能与我们拥有第一个值的模型不对应.

If I run GridSearchCV again with the other scoring parameter, it might not find the same solution, and so the score it reports might not correspond to the same model as the one for which we have the first value.

也许我可以提取参数并将它们提供给新管道,然后使用新管道运行 cross_val_score?有没有更好的办法?谢谢.

Maybe I can extract the parameters and supply them to a new pipeline, and then run cross_val_score with the new pipeline? Is there a better way? Thanks.

推荐答案

不幸的是,现在使用 GridSearchCV 或任何内置的 sklearn 方法/对象,这并不简单.

This is unfortunately not straightforward right now with GridSearchCV, or any built in sklearn method/object.

尽管有传言称有多个得分手输出,但此功能可能不会很快推出.

Although there is talk of having multiple scorer outputs, this feature will probably not come soon.

所以你必须自己做,有几种方法:

So you will have to do it yourself, there are several ways:

1) 您可以查看 cross_val_score 的代码并自己执行交叉验证循环,在每次折叠完成后调用感兴趣的评分者.

1) You can take a look at the code of cross_val_score and perform the cross validation loop yourself, calling the scorers of interest once each fold is done.

2) [不推荐] 您也可以根据自己感兴趣的得分手构建自己的得分手,并让他们将得分作为数组输出.然后你会发现自己遇到了这里解释的问题:sklearn - 多分数交叉验证

2) [not recommended] You can also build your own scorer out of the scorers you are interested in and have them output the scores as an array. You will then find yourself with the problem explained here: sklearn - Cross validation with multiple scores

3) 由于您可以编码您的自己的评分器,您可以制作一个评分器,输出您的分数之一(您希望 GridSearchCV 做出决定的分数),并存储您感兴趣的所有其他分数一个单独的地方,可能是一个静态/全局变量,甚至是一个文件.

3) Since you can code your own scorers, you could make a scorer that outputs one of your scores (the one by which you want GridSearchCV to make decisions), and which stores all the other scores you are interested in in a separate place, which may be a static/global variable, or even a file.

第 3 项似乎是最不乏味和最有前途的:

Number 3 seems the least tedious and most promising:

import numpy as np
from sklearn.metrics import r2_score, mean_squared_error
secret_mses = []

def r2_secret_mse(estimator, X_test, y_test):
    predictions = estimator.predict(X_test)
    secret_mses.append(mean_squared_error(y_test, predictions))
    return r2_score(y_test, predictions)

X = np.random.randn(20, 10)
y = np.random.randn(20)

from sklearn.cross_validation import cross_val_score
from sklearn.linear_model import Ridge

r2_scores = cross_val_score(Ridge(), X, y, scoring=r2_secret_mse, cv=5)

您将在 r2_scores 中找到 R2 分数,在 secret_mses 中找到相应的 MSE.

You will find the R2 scores in r2_scores and the corresponding MSEs in secret_mses.

请注意,如果并行,这可能会变得混乱.在这种情况下,您需要将分数写入内存映射中的特定位置.

Note that this can become messy if you go parallel. In that case you would need to write the scores to a specific place in a memmap for example.

这篇关于如何从 sklearn GridSearchCV 获取 MSE 和 R2?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆