scikit-learn GridSearchCV best_score_如何计算? [英] How is scikit-learn GridSearchCV best_score_ calculated?

查看:1706
本文介绍了scikit-learn GridSearchCV best_score_如何计算?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直试图找出如何计算GridSearchCV的best_score_参数(或者换句话说,这是什么意思). 文档说:

I've been trying to figure out how is the best_score_ parameter of GridSearchCV is being calculated (or in other words, what does it mean). The documentation says:

关于剩余数据的best_estimator得分.

Score of best_estimator on the left out data.

因此,我尝试将其转换为我理解的内容,并计算出每个kfold的实际"y"值和预测的ys的r2_score-并得到了不同的结果(使用了这段代码):

So, I tried to translate it into something I understand and calculated the r2_score of the actual "y"s and the predicted ys of each kfold - and got different results (used this piece of code):

test_pred = np.zeros(y.shape) * np.nan 
for train_ind, test_ind in kfold:
    clf.best_estimator_.fit(X[train_ind, :], y[train_ind])
    test_pred[test_ind] = clf.best_estimator_.predict(X[test_ind])
r2_test = r2_score(y, test_pred)

我到处搜索有关best_score_的更有意义的解释,但找不到任何东西.有人愿意解释吗?

I've searched everywhere for a more meaningful explanation of the best_score_ and couldn't find anything. Would anyone care to explain?

谢谢

推荐答案

这是最佳估算器的平均交叉验证得分.让我们制作一些数据并修复交叉验证的数据划分.

It's the mean cross-validation score of the best estimator. Let's make some data and fix the cross-validation's division of data.

>>> y = linspace(-5, 5, 200)
>>> X = (y + np.random.randn(200)).reshape(-1, 1)
>>> threefold = list(KFold(len(y)))

现在运行cross_val_scoreGridSearchCV,它们都具有固定的折痕.

Now run cross_val_score and GridSearchCV, both with these fixed folds.

>>> cross_val_score(LinearRegression(), X, y, cv=threefold)
array([-0.86060164,  0.2035956 , -0.81309259])
>>> gs = GridSearchCV(LinearRegression(), {}, cv=threefold, verbose=3).fit(X, y) 
Fitting 3 folds for each of 1 candidates, totalling 3 fits
[CV]  ................................................................
[CV] ...................................... , score=-0.860602 -   0.0s
[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:    0.0s
[CV]  ................................................................
[CV] ....................................... , score=0.203596 -   0.0s
[CV]  ................................................................
[CV] ...................................... , score=-0.813093 -   0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.0s finished

注意GridSearchCV输出中的score=-0.860602score=0.203596score=-0.813093;完全由cross_val_score返回的值.

Note the score=-0.860602, score=0.203596 and score=-0.813093 in the GridSearchCV output; exactly the values returned by cross_val_score.

请注意,均值"实际上是褶皱的宏观平均值. GridSearchCViid参数可用于获取样本的微观平均值.

Note that the "mean" is really a macro-average over the folds. The iid parameter to GridSearchCV can be used to get a micro-average over the samples instead.

这篇关于scikit-learn GridSearchCV best_score_如何计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆