XGBoost软件包中的功能得分(/重要性)如何计算? [英] How is the feature score(/importance) in the XGBoost package calculated?
问题描述
命令 xgb.importance
返回以 f得分衡量的功能重要性图。
该 f分数代表什么以及如何计算?
输出:
功能重要性图
这是一个指标,它简单地总结了每个功能被分割了多少次。它类似于R版本中的频率指标。 https:// cran .r-project.org / web / packages / xgboost / xgboost.pdf
它与功能重要性衡量标准基本相同。
即该变量被分割了多少次?
此方法的代码表明,它只是在所有树中添加了给定功能。
[此处.. https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py#L953] [1]
def get_fscore(self,fmap =''):
$上拆分变量时的计数p $ p>
获取每个功能的重要性。
参数
----------
fmap:str(可选)
特征图文件
的名称
树=自已。 get_dump(fmap)##将所有树木转储为文本
fmap = {}
表示树木中的树木:##循环浏览树木
以获得tree.split('\n '):#文本处理
arr = line.split('[')
如果len(arr)== 1:#文本处理
c继续
fid = arr [1] .split(']')[0]#文本处理
fid = fid.split('<')[0]#在大/小(查找变量名)
如果fid不在fmap中:#如果尚未看到功能ID
fmap [fid] = 1#将其添加到
否则:
fmap [fid] + = 1#否则递增
return fmap#返回fmap,其中包含每次在
The command
xgb.importance
returns a graph of feature importance measured by an f score.What does this f score represent and how is it calculated?
Output: Graph of feature importance
解决方案This is a metric that simply sums up how many times each feature is split on. It is analogous to the Frequency metric in the R version.https://cran.r-project.org/web/packages/xgboost/xgboost.pdf
It is about as basic a feature importance metric as you can get.
i.e. How many times was this variable split on?
The code for this method shows it is simply adding of the presence of a given feature in all the trees.
[here..https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py#L953][1]
def get_fscore(self, fmap=''): """Get feature importance of each feature. Parameters ---------- fmap: str (optional) The name of feature map file """ trees = self.get_dump(fmap) ## dump all the trees to text fmap = {} for tree in trees: ## loop through the trees for line in tree.split('\n'): # text processing arr = line.split('[') if len(arr) == 1: # text processing continue fid = arr[1].split(']')[0] # text processing fid = fid.split('<')[0] # split on the greater/less(find variable name) if fid not in fmap: # if the feature id hasn't been seen yet fmap[fid] = 1 # add it else: fmap[fid] += 1 # else increment it return fmap # return the fmap, which has the counts of each time a variable was split on
这篇关于XGBoost软件包中的功能得分(/重要性)如何计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!