尽管具有相同的列,但 xgboost 中的 feature_names mismach [英] feature_names mismach in xgboost despite having same columns
问题描述
我使用相同的列和顺序设置了训练 (X) 和测试数据 (test_data_process),如下所示:
I have training (X) and test data (test_data_process) set with the same columns and order, as indicated below:
但是当我这样做时
predictions = my_model.predict(test_data_process)
它给出了以下错误:
ValueError: feature_names 不匹配: ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10'、'f11'、'f12'、'f13'、'f14'、'f15'、'f16'、'f17'、'f18'、'f19'、'f20'、'f21'、'f22', 'f23', 'f24', 'f25', 'f26', 'f27', 'f28', 'f29', 'f30', 'f31', 'f32', 'f33', 'f34'] ['MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtFinSF2', 'UntalSFmtsFm'Bs', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr,'GaltGrs','BvGrs','BsmtHalfBath','GarageCars'、'GarageArea'、'WoodDeckSF'、'OpenPorchSF'、'EnclosedPorch'、'3SsnPorch'、'ScreenPorch'、'PoolArea'、'MiscVal'、'YrMoSold']预期 f22、f25、f0、f34、f32、f5、f20、f3、f33、f15、f24、f31、f28、f9、f8、f19、f14、f18、f17、f2、f13、f4、f27、f16、f1, f29, f11, f26, f10, f7, f21, f30, f23, f6, f12 在输入数据中训练数据没有以下字段:OpenPorchSF、BsmtFinSF1、LotFrontage、GrLivArea、YrMoSold、FullBath、TotRmsAbvGrd、GarageCars、YearRemodAdd、BedroomAbvGr、PoolArea、KitchenAbvGr、LotArea、HalfBath、BSF、MSPullF、MsPullF、MiscsFullFullBath、ScreenPorch、3SsnPorch、TotalBsmtSF、GarageYrBlt、MasVnrArea、OverallQual、壁炉、WoodDeckSF、2ndFlrSF、BsmtFinSF2、BsmtHalfBath、LowQualFinSF、OverallCond、GarageArea
ValueError: feature_names mismatch: ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f20', 'f21', 'f22', 'f23', 'f24', 'f25', 'f26', 'f27', 'f28', 'f29', 'f30', 'f31', 'f32', 'f33', 'f34'] ['MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'YrMoSold'] expected f22, f25, f0, f34, f32, f5, f20, f3, f33, f15, f24, f31, f28, f9, f8, f19, f14, f18, f17, f2, f13, f4, f27, f16, f1, f29, f11, f26, f10, f7, f21, f30, f23, f6, f12 in input data training data did not have the following fields: OpenPorchSF, BsmtFinSF1, LotFrontage, GrLivArea, YrMoSold, FullBath, TotRmsAbvGrd, GarageCars, YearRemodAdd, BedroomAbvGr, PoolArea, KitchenAbvGr, LotArea, HalfBath, MiscVal, EnclosedPorch, BsmtUnfSF, MSSubClass, BsmtFullBath, YearBuilt, 1stFlrSF, ScreenPorch, 3SsnPorch, TotalBsmtSF, GarageYrBlt, MasVnrArea, OverallQual, Fireplaces, WoodDeckSF, 2ndFlrSF, BsmtFinSF2, BsmtHalfBath, LowQualFinSF, OverallCond, GarageArea
所以它抱怨训练数据 (X) 没有这些字段,而它有.
So it complains that the training data (X) does not have those fields, whereas it has.
如何解决这个问题?
[更新]:
我的代码:
X = data.select_dtypes(exclude=['object']).drop(columns=['Id'])
X['YrMoSold'] = X['YrSold'] * 12 + X['MoSold']
X = X.drop(columns=['YrSold', 'MoSold', 'SalePrice'])
X = X.fillna(0.0000001)
train_X, val_X, train_y, val_y = train_test_split(X.values, y.values, test_size=0.2)
my_model = XGBRegressor(n_estimators=100, learning_rate=0.05, booster='gbtree')
my_model.fit(train_X, train_y, early_stopping_rounds=5,
eval_set=[(val_X, val_y)], verbose=False)
test_data_process = test_data.select_dtypes(exclude=['object']).drop(columns=['Id'])
test_data_process['YrMoSold'] = test_data_process['YrSold'] * 12 + test_data['MoSold']
test_data_process = test_data_process.drop(columns=['YrSold', 'MoSold'])
test_data_process = test_data_process.fillna(0.0000001)
test_data_process = test_data_process[X.columns]
predictions = my_model.predict(test_data_process)
推荐答案
这是一个诚实的错误.
在提供数据时,您使用的是 np 数组:
When feeding your data you are using np arrays:
train_X, val_X, train_y, val_y = train_test_split(X.values, y.values, test_size=0.2)
(X.values 是一个 np.array)
(X.values is a np.array)
没有定义列名
在输入数据集进行预测时,您使用的是数据框
when entering the data set for prediction you are using a dataframe
你应该使用一个numpy数组,你可以使用以下方法转换它:
you should use a numpy array, you can convert it by using:
predictions = my_model.predict(test_data_process.values)
(添加.values)
这篇关于尽管具有相同的列,但 xgboost 中的 feature_names mismach的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!