将专长重要性的索引映射到数据框中的列的索引 [英] Mapping the index of the feat importances to the index of columns in a dataframe

查看:71
本文介绍了将专长重要性的索引映射到数据框中的列的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,我使用来自xgboost的feature_importance绘制了一个图形。但是,该图返回 f值。我不知道图表中代表了哪个功能。我听说过如何解决此问题的一种方式是将数据框中的要素索引映射到feature_importance f-values的索引,然后手动选择列。我该怎么做?另外,如果还有另一种方法可以帮助您,将不胜感激:

Hello I plotted a graph using feature_importance from xgboost. However, the graph returns "f-values". I do not know which feature is being represented in the graph. One way I heard about how to solve this is mapping the index of the features within my dataframe to the index of the feature_importance "f-values" and selecting the columns manually. How do I go about in doing this? Also, if there is another way in doing this, help would truly be appreciated:

以下是我的代码:

feature_importance = pd.Series(model.booster().get_fscore()).sort_values(ascending=False)
feature_importance.plot(kind='bar', title='Feature Importances')
plt.ylabel('Feature Importance Score')

此处是图形:

Here is the graph:

print(feature_importance.head())

Output: 
f20     320
f22      85
f29      67
f34      38
f81      20


推荐答案

我在这里尝试了一个简单的示例来了解最新情况,这是我编写的代码:

i tried a simple example here to see whats up, here is the code i 've written:

import pandas as pd
import xgboost as xgb
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt

model = xgb.XGBRegressor()

size = 100

data = pd.DataFrame([], columns=['a','b','c','target'])
data['a'] = np.random.rand(size)
data['b'] = np.random.rand(size)
data['c'] = np.random.rand(size)

data['target'] = np.random.rand(size)*data['a'] + data['b']

model.fit(data.drop('target',1), data.target)

feature_importance = pd.Series(model.booster().get_fscore()).sort_values(ascending=False)
feature_importance.plot(kind='bar', title='Feature Importances')
plt.ylabel('Feature Importance Score')

结果是:

您会看到标签很好。

现在,让我们传递一个数组而不是数据帧:

now, lets pass an array instead of a dataframe:

model.fit(np.array(data.drop('target',1)), data.target)

feature_importance = pd.Series(model.booster().get_fscore()).sort_values(ascending=False)
feature_importance.plot(kind='bar', title='Feature Importances')
plt.ylabel('Feature Importance Score')

因此,您的问题是一个np.array默认情况下没有索引/列名称,因此xgboost设置默认功能名称(f0,f1,...,fn)

hence your problem, a np.array has no index/column names by default, therefore xgboost make default feature names (f0, f1, ..., fn)

这篇关于将专长重要性的索引映射到数据框中的列的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆