如何将 GridSearchCV 中的验证集与训练集分开标准化? [英] How to normalize validation set in GridSearchCV separately from training set?

查看:73
本文介绍了如何将 GridSearchCV 中的验证集与训练集分开标准化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 GridSearchCV 中分别缩放每个折叠?

How to scale each fold separately in GridSearchCV?

在训练 ML 模型时,我们应该标准化(缩放)与训练数据相关的特征.然后在测试数据上使用拟合的缩放器.但是如果使用网格搜索 CV(5 折),我们通常会向它提供已经缩放的训练数据.然后被分成折叠.但是我们将如何分别缩放 4-1 折中的每一个?

While training an ML model we should normalize (scale) features regarding to the training data. And then use the fitted scaler on the test data. But if using a grid search CV (5 fold) we usually provide it the training data which is already scaled. That then gets separated into folds. But How would we separately scale each of the 4-1 folds?

scl = MinMaxScaler()
scl.fit_transform(X_train)
scl.transform(X_test)

# The training data was scaled all together and
# not train and validation separately
cv = GridSearchCV(MODEL, GRID, scoring='f1', cv=5)
cv.fit(X_train, Y_train)

如果您对如何实现这样的目标有任何建议,请告诉我.

Please let me know if you have a suggestion how to achieve something like this.

推荐答案

这就是 管道用于.

将您当前的模型转换为流水线模型,如下所示:

Convert your current model to Pipelined model like this:

new_model = Pipeline([('scaler', MinMaxScaler()), ('model', cur_model)])

不要事先扩展您的训练集.每次调用 fit 时,Pipeline 将自动拟合和转换您的训练数据,(当然只使用训练数据)并使用拟合的 MinMaxScaler 在测试集上调用 transform.

Do not scale your training set beforehand. Every time fit is called, Pipeline will automatically fit and transform your training data, (only using training data of course) and call transform on test set using fitted MinMaxScaler.

这篇关于如何将 GridSearchCV 中的验证集与训练集分开标准化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆