GridSearchCV - XGBoost - 提前停止 [英] GridSearchCV - XGBoost - Early Stopping

查看:53
本文介绍了GridSearchCV - XGBoost - 提前停止的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 XGBoost 上使用 scikit-learn 的 GridSearchCV 进行超参数搜索.在 gridsearch 期间,我希望它早点停止,因为它大大减少了搜索时间并且(期望)在我的预测/回归任务上有更好的结果.我正在通过其 Scikit-Learn API 使用 XGBoost.

i am trying to do hyperparemeter search with using scikit-learn's GridSearchCV on XGBoost. During gridsearch i'd like it to early stop, since it reduce search time drastically and (expecting to) have better results on my prediction/regression task. I am using XGBoost via its Scikit-Learn API.

    model = xgb.XGBRegressor()
    GridSearchCV(model, paramGrid, verbose=verbose ,fit_params={'early_stopping_rounds':42}, cv=TimeSeriesSplit(n_splits=cv).get_n_splits([trainX, trainY]), n_jobs=n_jobs, iid=iid).fit(trainX,trainY)

我尝试使用 fit_params 提供提前停止参数,但随后抛出此错误,这基本上是因为缺少提前停止所需的验证集:

I tried to give early stopping parameters with using fit_params, but then it throws this error which is basically because of lack of validation set which is required for early stopping:

/opt/anaconda/anaconda3/lib/python3.5/site-packages/xgboost/callback.py in callback(env=XGBoostCallbackEnv(model=<xgboost.core.Booster o...teration=4000, rank=0, evaluation_result_list=[]))
    187         else:
    188             assert env.cvfolds is not None
    189 
    190     def callback(env):
    191         """internal function"""
--> 192         score = env.evaluation_result_list[-1][1]
        score = undefined
        env.evaluation_result_list = []
    193         if len(state) == 0:
    194             init(env)
    195         best_score = state['best_score']
    196         best_iteration = state['best_iteration']

如何使用 early_stopping_rounds 在 XGBoost 上应用 GridSearch?

How can i apply GridSearch on XGBoost with using early_stopping_rounds?

注意:模型在没有 gridsearch 的情况下工作,GridSearch 在没有 'fit_params={'early_stopping_rounds':42} 的情况下也能工作

note: model is working without gridsearch, also GridSearch works without 'fit_params={'early_stopping_rounds':42}

推荐答案

从 sklearn 0.21.3 开始,对@glao 的回答的更新以及对@Vasim 的评论/问题的回应(注意 fit_params已从 GridSearchCV 的实例化中移出并移入 fit() 方法;此外,导入还专门从 xgboost 中引入 sklearn 包装器模块):

An update to @glao's answer and a response to @Vasim's comment/question, as of sklearn 0.21.3 (note that fit_params has been moved out of the instantiation of GridSearchCV and been moved into the fit() method; also, the import specifically pulls in the sklearn wrapper module from xgboost):

import xgboost.sklearn as xgb
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import TimeSeriesSplit

cv = 2

trainX= [[1], [2], [3], [4], [5]]
trainY = [1, 2, 3, 4, 5]

# these are the evaluation sets
testX = trainX 
testY = trainY

paramGrid = {"subsample" : [0.5, 0.8]}

fit_params={"early_stopping_rounds":42, 
            "eval_metric" : "mae", 
            "eval_set" : [[testX, testY]]}

model = xgb.XGBRegressor()

gridsearch = GridSearchCV(model, paramGrid, verbose=1,             
         cv=TimeSeriesSplit(n_splits=cv).get_n_splits([trainX, trainY]))

gridsearch.fit(trainX, trainY, **fit_params)

这篇关于GridSearchCV - XGBoost - 提前停止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆