尽管具有相同的列,但 xgboost 中的 feature_names mismach [英] feature_names mismach in xgboost despite having same columns

查看:43
本文介绍了尽管具有相同的列,但 xgboost 中的 feature_names mismach的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用相同的列和顺序设置了训练 (X) 和测试数据 (test_data_process),如下所示:

I have training (X) and test data (test_data_process) set with the same columns and order, as indicated below:

但是当我这样做时

predictions = my_model.predict(test_data_process)    

它给出了以下错误:

ValueError: feature_names 不匹配: ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10'、'f11'、'f12'、'f13'、'f14'、'f15'、'f16'、'f17'、'f18'、'f19'、'f20'、'f21'、'f22', 'f23', 'f24', 'f25', 'f26', 'f27', 'f28', 'f29', 'f30', 'f31', 'f32', 'f33', 'f34'] ['MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtFinSF2', 'UntalSFmtsFm'Bs', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr,'GaltGrs','BvGrs','BsmtHalfBath','GarageCars'、'GarageArea'、'WoodDeckSF'、'OpenPorchSF'、'EnclosedPorch'、'3SsnPorch'、'ScreenPorch'、'PoolArea'、'MiscVal'、'YrMoSold']预期 f22、f25、f0、f34、f32、f5、f20、f3、f33、f15、f24、f31、f28、f9、f8、f19、f14、f18、f17、f2、f13、f4、f27、f16、f1, f29, f11, f26, f10, f7, f21, f30, f23, f6, f12 在输入数据中训练数据没有以下字段:OpenPorchSF、BsmtFinSF1、LotFrontage、GrLivArea、YrMoSold、FullBath、TotRmsAbvGrd、GarageCars、YearRemodAdd、BedroomAbvGr、PoolArea、KitchenAbvGr、LotArea、HalfBath、BSF、MSPullF、MsPullF、MiscsFullFullBath、ScreenPorch、3SsnPorch、TotalBsmtSF、GarageYrBlt、MasVnrArea、OverallQual、壁炉、WoodDeckSF、2ndFlrSF、BsmtFinSF2、BsmtHalfBath、LowQualFinSF、OverallCond、GarageArea

ValueError: feature_names mismatch: ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f20', 'f21', 'f22', 'f23', 'f24', 'f25', 'f26', 'f27', 'f28', 'f29', 'f30', 'f31', 'f32', 'f33', 'f34'] ['MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'YrMoSold'] expected f22, f25, f0, f34, f32, f5, f20, f3, f33, f15, f24, f31, f28, f9, f8, f19, f14, f18, f17, f2, f13, f4, f27, f16, f1, f29, f11, f26, f10, f7, f21, f30, f23, f6, f12 in input data training data did not have the following fields: OpenPorchSF, BsmtFinSF1, LotFrontage, GrLivArea, YrMoSold, FullBath, TotRmsAbvGrd, GarageCars, YearRemodAdd, BedroomAbvGr, PoolArea, KitchenAbvGr, LotArea, HalfBath, MiscVal, EnclosedPorch, BsmtUnfSF, MSSubClass, BsmtFullBath, YearBuilt, 1stFlrSF, ScreenPorch, 3SsnPorch, TotalBsmtSF, GarageYrBlt, MasVnrArea, OverallQual, Fireplaces, WoodDeckSF, 2ndFlrSF, BsmtFinSF2, BsmtHalfBath, LowQualFinSF, OverallCond, GarageArea

所以它抱怨训练数据 (X) 没有这些字段,而它有.

So it complains that the training data (X) does not have those fields, whereas it has.

如何解决这个问题?

[更新]:

我的代码:

X = data.select_dtypes(exclude=['object']).drop(columns=['Id'])
X['YrMoSold'] = X['YrSold'] * 12 + X['MoSold']
X = X.drop(columns=['YrSold', 'MoSold', 'SalePrice'])
X = X.fillna(0.0000001)

train_X, val_X, train_y, val_y = train_test_split(X.values, y.values, test_size=0.2)

my_model = XGBRegressor(n_estimators=100, learning_rate=0.05, booster='gbtree')
my_model.fit(train_X, train_y, early_stopping_rounds=5, 
    eval_set=[(val_X, val_y)], verbose=False)

test_data_process = test_data.select_dtypes(exclude=['object']).drop(columns=['Id'])
test_data_process['YrMoSold'] = test_data_process['YrSold'] * 12 + test_data['MoSold']
test_data_process = test_data_process.drop(columns=['YrSold', 'MoSold'])
test_data_process = test_data_process.fillna(0.0000001)
test_data_process = test_data_process[X.columns]

predictions = my_model.predict(test_data_process)    

推荐答案

这是一个诚实的错误.

在提供数据时,您使用的是 np 数组:

When feeding your data you are using np arrays:

train_X, val_X, train_y, val_y = train_test_split(X.values, y.values, test_size=0.2)

(X.values 是一个 np.array)

(X.values is a np.array)

没有定义列名

在输入数据集进行预测时,您使用的是数据框

when entering the data set for prediction you are using a dataframe

你应该使用一个numpy数组,你可以使用以下方法转换它:

you should use a numpy array, you can convert it by using:

predictions = my_model.predict(test_data_process.values)  

(添加.values)

这篇关于尽管具有相同的列,但 xgboost 中的 feature_names mismach的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆