如何使用 Scikit-Learn Wrapper 获得 XGBoost 和 XGBoost 的预测来匹配? [英] How to get Predictions with XGBoost and XGBoost using Scikit-Learn Wrapper to match?

查看:47
本文介绍了如何使用 Scikit-Learn Wrapper 获得 XGBoost 和 XGBoost 的预测来匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Python 中的 XGBoost 新手,所以如果这里的答案很明显,我深表歉意,但我正在尝试使用 Panda 数据框并在 Python 中获取 XGBoost,以便为我提供与使用 Scikit-Learn 包装器时相同的预测对于同样的练习.到目前为止,我一直无法这样做.举个例子,这里我采用波士顿数据集,转换为熊猫数据帧,训练数据集的前 500 个观测值,然后预测最后 6 个.我首先使用 XGBoost,然后使用 Scikit-Learn 包装器和即使我将模型的参数设置为相同,我也会得到不同的预测.具体来说,数组预测看起来与数组预测 2 非常不同(请参阅下面的代码).任何帮助将不胜感激!

I am new to XGBoost in Python so I apologize if the answer here is obvious, but I am trying to take a panda dataframe and get XGBoost in Python to give me the same predictions I get when I use the Scikit-Learn wrapper for the same exercise. So far I've been unable to do so. Just to give an example, here I take the boston dataset, convert to a panda dataframe, train on the first 500 observations of the dataset and then predict the last 6. I do it with XGBoost first and then with the Scikit-Learn wrapper and I get different predictions even though I've set the parameters of the model to be the same. Specifically the array predictions looks very different from the array predictions2 (see code below). Any help would be much appreciated!

from sklearn import datasets
import pandas as pd
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from xgboost.sklearn import XGBRegressor

### Use the boston data as an example, train on first 500, predict last 6 
boston_data = datasets.load_boston()
df_boston = pd.DataFrame(boston_data.data,columns=boston_data.feature_names)
df_boston['target'] = pd.Series(boston_data.target)


#### Code using XGBoost
Sub_train = df_boston.head(500)
target = Sub_train["target"]
Sub_train = Sub_train.drop('target', axis=1) 

Sub_predict = df_boston.tail(6)
Sub_predict = Sub_predict.drop('target', axis=1)  

xgtrain = xgb.DMatrix(Sub_train.as_matrix(), label=target.tolist())
xgtest = xgb.DMatrix(Sub_predict.as_matrix())

params = {'booster': 'gblinear', 'objective': 'reg:linear', 
      'max_depth': 2, 'learning_rate': .1, 'n_estimators': 500,    'min_child_weight': 3, 'colsample_bytree': .7,
      'subsample': .8, 'gamma': 0, 'reg_alpha': 1}

model = xgb.train(dtrain=xgtrain, params=params)

predictions = model.predict(xgtest)

#### Code using Sk learn Wrapper for XGBoost
model = XGBRegressor(learning_rate =.1, n_estimators=500,
max_depth=2, min_child_weight=3, gamma=0, 
subsample=.8, colsample_bytree=.7, reg_alpha=1, 
objective= 'reg:linear')

target = "target"

Sub_train = df_boston.head(500)
Sub_predict = df_boston.tail(6)
Sub_predict = Sub_predict.drop('target', axis=1)

Ex_List = ['target']

predictors = [i for i in Sub_train.columns if i not in Ex_List]

model = model.fit(Sub_train[predictors],Sub_train[target])

predictions2 = model.predict(Sub_predict)

推荐答案

请看这里的答案

xgboost.train 将忽略参数 n_estimators,而xgboost.XGBRegressor 接受.在 xgboost.train 中,提升迭代次数(i.e. n_estimators) 由 num_boost_round(default: 10)

xgboost.train will ignore parameter n_estimators, while xgboost.XGBRegressor accepts. In xgboost.train, boosting iterations (i.e. n_estimators) is controlled by num_boost_round(default: 10)

它建议从提供给 xgb.train 的参数中删除 n_estimators 并将其替换为 num_boost_round.

It suggests to remove n_estimators from params supplied to xgb.train and replace it with num_boost_round.

所以像这样改变你的参数:

So change your params like this:

params = {'objective': 'reg:linear', 
      'max_depth': 2, 'learning_rate': .1,    
      'min_child_weight': 3, 'colsample_bytree': .7,
      'subsample': .8, 'gamma': 0, 'alpha': 1}

像这样训练 xgb.train:

And train xgb.train like this:

model = xgb.train(dtrain=xgtrain, params=params,num_boost_round=500)

你会得到同样的结果.

或者,保持 xgb.train 不变并像这样更改 XGBRegressor:

Alternatively, keep the xgb.train as it is and change the XGBRegressor like this:

model = XGBRegressor(learning_rate =.1, n_estimators=10,
                     max_depth=2, min_child_weight=3, gamma=0, 
                     subsample=.8, colsample_bytree=.7, reg_alpha=1, 
                     objective= 'reg:linear')

那么你也会得到同样的结果.

Then also you will get same results.

这篇关于如何使用 Scikit-Learn Wrapper 获得 XGBoost 和 XGBoost 的预测来匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆