sklearn GridSearchCV 与管道 [英] sklearn GridSearchCV with Pipeline
问题描述
我是 sklearn
的 Pipeline
和 GridSearchCV
功能的新手.我正在尝试构建一个管道,该管道首先对我的训练数据执行 RandomizedPCA,然后拟合岭回归模型.这是我的代码:
I'm new to sklearn
's Pipeline
and GridSearchCV
features. I am trying to build a pipeline which first does RandomizedPCA on my training data and then fits a ridge regression model. Here is my code:
pca = RandomizedPCA(1000, whiten=True)
rgn = Ridge()
pca_ridge = Pipeline([('pca', pca),
('ridge', rgn)])
parameters = {'ridge__alpha': 10 ** np.linspace(-5, -2, 3)}
grid_search = GridSearchCV(pca_ridge, parameters, cv=2, n_jobs=1, scoring='mean_squared_error')
grid_search.fit(train_x, train_y[:, 1:])
我知道 RidgeCV
函数,但我想尝试使用 Pipeline 和 GridSearch CV.
I know about the RidgeCV
function but I want to try out Pipeline and GridSearch CV.
我希望网格搜索 CV 报告 RMSE 错误,但这在 sklearn 中似乎不受支持,所以我正在使用 MSE.然而,它重述的分数是负数:
I want the grid search CV to report RMSE error, but this doesn't seem supported in sklearn so I'm making do with MSE. However, the scores it resports are negative:
In [41]: grid_search.grid_scores_
Out[41]:
[mean: -0.02665, std: 0.00007, params: {'ridge__alpha': 1.0000000000000001e-05},
mean: -0.02658, std: 0.00009, params: {'ridge__alpha': 0.031622776601683791},
mean: -0.02626, std: 0.00008, params: {'ridge__alpha': 100.0}]
显然这对于均方误差是不可能的 - 我在这里做错了什么?
Obviously this isn't possible for mean squared error - what am I doing wrong here?
推荐答案
那些分数是负 MSE 分数,即否定它们,你得到 MSE.事情是GridSearchCV
,按照惯例,总是试图最大化它的分数,所以像MSE这样的损失函数必须被否定.
Those scores are negative MSE scores, i.e. negate them and you get the MSE. The thing is that GridSearchCV
, by convention, always tries to maximize its score so loss functions like MSE have to be negated.
这篇关于sklearn GridSearchCV 与管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!