XGboost:无法在管道中传递eval_set的验证数据 [英] XGboost: cannot pass validation data for eval_set in pipeline

查看：100 发布时间：2021/5/31 18:36:01 python-3.x machine-learning scikit-learn xgboost

本文介绍了XGboost:无法在管道中传递eval_set的验证数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在管道中为XGboost模型实现GridSearchCV.我在代码上方定义了数据预处理器，并提供了一些网格参数

I want to implement GridSearchCV for XGboost model in pipeline. I have preprocessor for data, defined above the code, some grid params

XGBmodel = XGBRegressor(random_state=0)
pipe = Pipeline(steps=[
    ('preprocess', preprocessor),
    ('XGBmodel', XGBmodel)
])

我想通过这些合适的参数

And I want to pass these fit params

fit_params = {"XGBmodel__eval_set": [(X_valid, y_valid)], 
              "XGBmodel__early_stopping_rounds": 10, 
              "XGBmodel__verbose": False}

我正在尝试拟合模型

searchCV = GridSearchCV(pipe, cv=5, param_grid=param_grid, fit_params=fit_params)
searchCV.fit(X_train, y_train)

但是我在使用 eval_set 时遇到错误:DataFrame.dtypes for data must be int, float or bool

but I get error on the line with eval_set: DataFrame.dtypes for data must be int, float or bool

我想这是因为验证数据没有经过预处理，但是当我在Google上搜索时，我发现到处都是通过这种方式完成的，并且似乎应该可以工作.另外，我试图找到一种方法将预处理器分别应用于验证数据，但是如果不先对训练数据进行拟合，就无法转换验证数据.

I guess it is because validation data aren't going through the preprocessing, but when I google I find that everywhere it is done by this way and seems it should work. Also I tried to find a way to apply preprocessor for validation data separately, but it is not possible to transform validation data without fitting train data before it.

完整代码

columns = num_cols + cat_cols
X_train = X_full_train[columns].copy()
X_valid = X_full_valid[columns].copy()

num_preprocessor = SimpleImputer(strategy = 'mean')
cat_preprocessor = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy = 'most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(transformers=[
    ('num', num_preprocessor, num_cols),
    ('cat', cat_preprocessor, cat_cols)
])

XGBmodel = XGBRegressor(random_state=0)
pipe = Pipeline(steps=[
    ('preprocess', preprocessor),
    ('XGBmodel', XGBmodel)
])

param_grid = {
    "XGBmodel__n_estimators": [10, 50, 100, 500],
    "XGBmodel__learning_rate": [0.1, 0.5, 1],
}

fit_params = {"XGBmodel__eval_set": [(X_valid, y_valid)], 
              "XGBmodel__early_stopping_rounds": 10, 
              "XGBmodel__verbose": False}

searchCV = GridSearchCV(pipe, cv=5, param_grid=param_grid, fit_params=fit_params)
searchCV.fit(X_train, y_train)

有什么方法可以预处理管道中的验证数据吗?或者也许完全不同的方式来实现这个东西?

Is there any way to preprocess validation data in pipeline? Or maybe completely different way to implement this thing?

XGboost:无法在管道中传递eval_set的验证数据 [英] XGboost: cannot pass validation data for eval_set in pipeline

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

XGboost:无法在管道中传递eval_set的验证数据 [英] XGboost: cannot pass validation data for eval_set in pipeline

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭