如何在不重新训练模型的情况下在 XGBoost 特征重要性图中获取实际特征名称? [英] How to get actual feature names in XGBoost feature importance plot without retraining the model?

查看:63
本文介绍了如何在不重新训练模型的情况下在 XGBoost 特征重要性图中获取实际特征名称?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Stackoverflow 上遇到了几个问题,其中大众面临的问题是他们在使用 XGBoost 模型拟合/训练之前对训练数据进行预处理,例如使用中心和比例等

I have come across several questions on Stackoverflow, where the problem faced by masses is that they preprocess the training data, such as using centre and scale etc. before fitting/training the XGBoost model using, for instance

`

scaler = MinMaxScaler(feature_range=(0, 1))
X = scaler.fit_transform(X)
my_model_name = XGBClassifier()
my_model_name.fit(X,Y)` 

其中 X 和 Y 分别是训练数据和标签,缩放返回二维 NumPy 数组,从而丢失特征名称.

where X and Y are the training data and labels respectively, scaling which returns a 2D NumPy array, thereby losing the feature names.

我已经训练了我的 XGBoost 模型,但使用了预处理数据(使用 MinMaxScaler 进行中心和缩放).因此,我处于类似的情况,其中列名称/功能名称丢失.因此,当我尝试使用 plot_importance(my_model_name) 时,它会导致特征重要性的图,但只有特征名称为 f0、f1、f2 等,而不是实际的特征名称数据集,必须是显而易见的.

I have trained my XGBoost model, but using the preprocessed data (centre and scale using MinMaxScaler). Thereby, I am in a similar situation where the column names/feature names are lost. Thus, when I try to use plot_importance(my_model_name), it leads to the plot of feature importance, but only with feature names as f0, f1, f2 etc., and not with the actual feature names in the dataset, as must be obvious.

关于 SO 的大多数答案都与以不丢失特征名称的方式训练模型有关(例如在数据框列上使用 pd.get_dummies.我有一个问题,即在使用 plot_importance(my_model_name),不重新训练模型?有没有办法将特征名称 f0、f1、f2 等从原始训练数据(未预处理,带有列名称)映射到特征生成重要性图,以便在图中绘制实际特征名称?非常感谢您在这方面的任何帮助.

Most answers on SO pertain to training the model in a way that feature names aren't lost (such as using pd.get_dummies on data frame columns. I have the query that how I can get the actual feature names when using plot_importance(my_model_name), without retraining the model? Is there a way to map the feature names f0,f1,f2 etc. from the original training data (not pre-processed, with column names) to the feature importance plot generated, so that the actual feature names are plotted in the graph? Any help in this regard is highly appreciated.

推荐答案

您可以通过以下方式获取功能名称:

You can get the features names by:

model.get_booster().feature_names

这篇关于如何在不重新训练模型的情况下在 XGBoost 特征重要性图中获取实际特征名称?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆