XGBoost软件包中的功能得分(/重要性)如何计算? [英] How is the feature score(/importance) in the XGBoost package calculated?

查看:558
本文介绍了XGBoost软件包中的功能得分(/重要性)如何计算?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

命令 xgb.importance 返回以 f得分衡量的功能重要性图。



f分数代表什么以及如何计算?



输出:

功能重要性图

解决方案

这是一个指标,它简单地总结了每个功能被分割了多少次。它类似于R版本中的频率指标。 https:// cran .r-project.org / web / packages / xgboost / xgboost.pdf



它与功能重要性衡量标准基本相同。



即该变量被分割了多少次?



此方法的代码表明,它只是在所有树中添加了给定功能。



[此处.. https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py#L953] [1]

  def get_fscore(self,fmap =''):
获取每个功能的重要性。
参数
----------
fmap:str(可选)
特征图文件
的名称
树=自已。 get_dump(fmap)##将所有树木转储为文本
fmap = {}
表示树木中的树木:##循环浏览树木
以获得tree.split('\n '):#文本处理
arr = line.split('[')
如果len(arr)== 1:#文本处理
c继续
fid = arr [1] .split(']')[0]#文本处理
fid = fid.split('<')[0]#在大/小(查找变量名)

如果fid不在fmap中:#如果尚未看到功能ID
fmap [fid] = 1#将其添加到
否则:
fmap [fid] + = 1#否则递增
return fmap#返回fmap,其中包含每次在

The command xgb.importance returns a graph of feature importance measured by an f score.

What does this f score represent and how is it calculated?

Output: Graph of feature importance

解决方案

This is a metric that simply sums up how many times each feature is split on. It is analogous to the Frequency metric in the R version.https://cran.r-project.org/web/packages/xgboost/xgboost.pdf

It is about as basic a feature importance metric as you can get.

i.e. How many times was this variable split on?

The code for this method shows it is simply adding of the presence of a given feature in all the trees.

[here..https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py#L953][1]

def get_fscore(self, fmap=''):
    """Get feature importance of each feature.
    Parameters
    ----------
    fmap: str (optional)
       The name of feature map file
    """
    trees = self.get_dump(fmap)  ## dump all the trees to text
    fmap = {}                    
    for tree in trees:              ## loop through the trees
        for line in tree.split('\n'):     # text processing
            arr = line.split('[')
            if len(arr) == 1:             # text processing 
                continue
            fid = arr[1].split(']')[0]    # text processing
            fid = fid.split('<')[0]       # split on the greater/less(find variable name)

            if fid not in fmap:  # if the feature id hasn't been seen yet
                fmap[fid] = 1    # add it
            else:
                fmap[fid] += 1   # else increment it
    return fmap                  # return the fmap, which has the counts of each time a  variable was split on

这篇关于XGBoost软件包中的功能得分(/重要性)如何计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆