XGBoost 错误 - 提供分类类型时,DMatrix 参数“enable_categorical"必须设置为“True" [英] XGBoost error - When categorical type is supplied, DMatrix parameter `enable_categorical` must be set to `True`

查看:694
本文介绍了XGBoost 错误 - 提供分类类型时,DMatrix 参数“enable_categorical"必须设置为“True"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有四个分类特征和第五个数字特征 (Var5).当我尝试以下代码时:

I have four categorial features and a fifth numerical one (Var5). When I try the following code:

cat_attribs = ['var1','var2','var3','var4']

full_pipeline = ColumnTransformer([('cat', OneHotEncoder(handle_unknown = 'ignore'), cat_attribs)], remainder = 'passthrough')
X_train = full_pipeline.fit_transform(X_train)

model = XGBRegressor(n_estimators=10, max_depth=20, verbosity=2)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

当模型尝试进行预测时,我收到以下错误消息:

I get the following error message when the model tries to make its predictions:

ValueError:数据的 DataFrame.dtypes 必须是 int、float、bool 或 categorical.什么时候提供了分类类型,DMatrix 参数enable_categorical 必须设置为 True.Var1、Var2、Var3、Var4

ValueError: DataFrame.dtypes for data must be int, float, bool or categorical. When categorical type is supplied, DMatrix parameter enable_categorical must be set to True.Var1, Var2, Var3, Var4

有人知道这里出了什么问题吗?

Does anyone know what's going wrong here?

如果有帮助,这里是 X_train 数据和 y_train 数据的一个小样本:

In case it's helpful, here is a small sample of the X_train data and the y_train data:

       Var1  Var2  Var3 Var4        Var5
1507856   JP  2009  6581  OME  325.787218
839624    FR  2018  5783  I_S   11.956326
1395729   BE  2015  6719  OME   42.888565
1971169   DK  2011  3506  RPP   70.094146
1140120   AT  2019  5474  NMM  270.082738

和:

          Ind_Var
1507856   8.013558
839624    4.105559
1395729   7.830077
1971169  83.000000
1140120  51.710526

推荐答案

你的代码的问题是你在 X_train 中编码了分类特征,但没有在 X_test 中编码,因此当您运行 model.predict(X_test) 时,您会收到一条错误消息.为了解决这个问题,首先需要将编码器拟合到X_train,然后使用编码器对X_trainX_test进行转换.请参阅下面的代码以获取示例.

The problem with your code is that you have encoded the categorical features in X_train but not in X_test, and therefore when you run model.predict(X_test) you get an error message. In order to fix this problem, at first you need to fit the encoder to X_train, and then use the encoder to transform both X_train and X_test. See the code below for an example.

import pandas as pd
from xgboost import XGBRegressor
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

# define the input data
df = pd.DataFrame([
 {'Var1': 'JP', 'Var2': 2009, 'Var3': 6581, 'Var4': 'OME', 'Var5': 325.787218, 'Ind_Var': 8.013558},
 {'Var1': 'FR', 'Var2': 2018, 'Var3': 5783, 'Var4': 'I_S', 'Var5': 11.956326, 'Ind_Var': 4.105559},
 {'Var1': 'BE', 'Var2': 2015, 'Var3': 6719, 'Var4': 'OME', 'Var5': 42.888565, 'Ind_Var': 7.830077},
 {'Var1': 'DK', 'Var2': 2011, 'Var3': 3506, 'Var4': 'RPP', 'Var5': 70.094146, 'Ind_Var': 83.000000},
 {'Var1': 'AT', 'Var2': 2019, 'Var3': 5474, 'Var4': 'NMM', 'Var5': 270.082738, 'Ind_Var': 51.710526}
])

# extract the features and target
X_train, y_train = df.iloc[:3, :-1], df.iloc[:3, -1]
X_test, y_test = df.iloc[3:, :-1], df.iloc[3:, -1]

# one-hot encode the categorical features
cat_attribs = ['Var1', 'Var2', 'Var3', 'Var4']
full_pipeline = ColumnTransformer([('cat', OneHotEncoder(handle_unknown='ignore'), cat_attribs)], remainder='passthrough')

encoder = full_pipeline.fit(X_train)
X_train = encoder.transform(X_train)
X_test = encoder.transform(X_test)

# train the model
model = XGBRegressor(n_estimators=10, max_depth=20, verbosity=2)
model.fit(X_train, y_train)

# extract the training set predictions
model.predict(X_train)
# array([7.0887003, 3.7923286, 7.0887003], dtype=float32)

# extract the test set predictions
model.predict(X_test)
# array([7.0887003, 7.0887003], dtype=float32)

这篇关于XGBoost 错误 - 提供分类类型时,DMatrix 参数“enable_categorical"必须设置为“True"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆