XGBClassifier 的特征重要性 [英] Feature Importance with XGBClassifier

查看:269
本文介绍了XGBClassifier 的特征重要性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望我读错了但在 XGBoost 库中 文档,有使用 feature_importances_ 提取特征重要性属性的注释,很像 sklearn 的随机森林.

Hopefully I'm reading this wrong but in the XGBoost library documentation, there is note of extracting the feature importance attributes using feature_importances_ much like sklearn's random forest.

但是,由于某种原因,我不断收到此错误:AttributeError: 'XGBClassifier' object has no attribute 'feature_importances_'

However, for some reason, I keep getting this error: AttributeError: 'XGBClassifier' object has no attribute 'feature_importances_'

我的代码片段如下:

from sklearn import datasets
import xgboost as xg
iris = datasets.load_iris()
X = iris.data
Y = iris.target
Y = iris.target[ Y < 2] # arbitrarily removing class 2 so it can be 0 and 1
X = X[range(1,len(Y)+1)] # cutting the dataframe to match the rows in Y
xgb = xg.XGBClassifier()
fit = xgb.fit(X, Y)
fit.feature_importances_

似乎您可以使用 Booster 对象通过调用 get_fscore 属性来计算特征重要性.我使用 XGBClassifier 而不是 Booster 的唯一原因是因为它能够被包装在 sklearn 管道中.关于特征提取的任何想法?还有其他人遇到这种情况吗?

It seems that you can compute feature importance using the Booster object by calling the get_fscore attribute. The only reason I'm using XGBClassifier over Booster is because it is able to be wrapped in a sklearn pipeline. Any thoughts on feature extractions? Is anyone else experiencing this?

推荐答案

正如评论所示,我怀疑您的问题是版本问题.但是,如果您不想/无法更新,那么以下功能应该适合您.

As the comments indicate, I suspect your issue is a versioning one. However if you do not want to/can't update, then the following function should work for you.

def get_xgb_imp(xgb, feat_names):
    from numpy import array
    imp_vals = xgb.booster().get_fscore()
    imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
    total = array(imp_dict.values()).sum()
    return {k:v/total for k,v in imp_dict.items()}


>>> import numpy as np
>>> from xgboost import XGBClassifier
>>> 
>>> feat_names = ['var1','var2','var3','var4','var5']
>>> np.random.seed(1)
>>> X = np.random.rand(100,5)
>>> y = np.random.rand(100).round()
>>> xgb = XGBClassifier(n_estimators=10)
>>> xgb = xgb.fit(X,y)
>>> 
>>> get_xgb_imp(xgb,feat_names)
{'var5': 0.0, 'var4': 0.20408163265306123, 'var1': 0.34693877551020408, 'var3': 0.22448979591836735, 'var2': 0.22448979591836735}

这篇关于XGBClassifier 的特征重要性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆