将列名称映射到随机森林功能的重要性 [英] Mapping column names to random forest feature importances

查看:87
本文介绍了将列名称映射到随机森林功能的重要性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试绘制随机森林模型的特征重要性,并将每个特征重要性映射回原始系数.我设法创建了一个显示重要性的图,并使用原始变量名作为标签,但是现在它按照变量名在数据集中的顺序(而不是按重要性顺序)对其进行排序.如何按功能重要性顺序订购它们?谢谢!

I am trying to plot feature importances for a random forest model and map each feature importance back to the original coefficient. I've managed to create a plot that shows the importances and uses the original variable names as labels but right now it's ordering the variable names in the order they were in the dataset (and not by order of importance). How do I order them in order of feature importance? Thanks!

我的代码是:

importances = brf.feature_importances_
std = np.std([tree.feature_importances_ for tree in brf.estimators_],
         axis=0)
indices = np.argsort(importances)[::-1]

# Print the feature ranking
print("Feature ranking:")

for f in range(x_dummies.shape[1]):
    print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))

# Plot the feature importances of the forest
plt.figure(figsize=(8,8))
plt.title("Feature importances")
plt.bar(range(x_train.shape[1]), importances[indices],
   color="r", yerr=std[indices], align="center")
feature_names = x_dummies.columns
plt.xticks(range(x_dummies.shape[1]), feature_names)
plt.xticks(rotation=90)
plt.xlim([-1, x_dummies.shape[1]])
plt.show()

推荐答案

一种通用解决方案是将特征/重要性放入数据框中,并在绘制之前对它们进行排序:

A sort of generic solution would be to throw the features/importances into a dataframe and sort them before plotting:

import pandas as pd
%matplotlib inline
#do code to support model
#"data" is the X dataframe and model is the SKlearn object

feats = {} # a dict to hold feature_name: feature_importance
for feature, importance in zip(data.columns, model.feature_importances_):
    feats[feature] = importance #add the name/value pair 

importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
importances.sort_values(by='Gini-importance').plot(kind='bar', rot=45)

这篇关于将列名称映射到随机森林功能的重要性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆