如何确定 RandomForestClassifier 中的 feature_importances? [英] How are feature_importances in RandomForestClassifier determined?

查看:20
本文介绍了如何确定 RandomForestClassifier 中的 feature_importances?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个以时间序列作为数据输入的分类任务,其中每个属性 (n=23) 代表一个特定的时间点.除了绝对分类结果,我想找出哪些属性/日期对结果的贡献程度.因此,我只是使用 feature_importances_,这对我来说效果很好.

I have a classification task with a time-series as the data input, where each attribute (n=23) represents a specific point in time. Besides the absolute classification result I would like to find out, which attributes/dates contribute to the result to what extent. Therefore I am just using the feature_importances_, which works well for me.

但是,我想知道它们是如何计算的以及使用了哪种度量/算法.很遗憾,我找不到有关此主题的任何文档.

However, I would like to know how they are getting calculated and which measure/algorithm is used. Unfortunately I could not find any documentation on this topic.

推荐答案

确实有几种方法可以获取功能重要性".通常,对于这个词的含义没有严格的共识.

There are indeed several ways to get feature "importances". As often, there is no strict consensus about what this word means.

在 scikit-learn 中,我们实现了 [1] 中描述的重要性(经常被引用,但不幸的是很少阅读......).它有时被称为基尼重要性"或平均减少杂质",定义为节点杂质的总减少(由到达该节点的概率加权(由到达该节点的样本比例近似))在所有合奏的树木.

In scikit-learn, we implement the importance as described in [1] (often cited, but unfortunately rarely read...). It is sometimes called "gini importance" or "mean decrease impurity" and is defined as the total decrease in node impurity (weighted by the probability of reaching that node (which is approximated by the proportion of samples reaching that node)) averaged over all trees of the ensemble.

在文献或其他一些软件包中,您还可以找到实现为平均降低精度"的特征重要性.基本上,这个想法是测量当您随机排列该特征的值时 OOB 数据准确性的下降.如果下降幅度很小,则该特征不重要,反之亦然.

In the literature or in some other packages, you can also find feature importances implemented as the "mean decrease accuracy". Basically, the idea is to measure the decrease in accuracy on OOB data when you randomly permute the values for that feature. If the decrease is low, then the feature is not important, and vice-versa.

(请注意,randomForest R 包中提供了这两种算法.)

(Note that both algorithms are available in the randomForest R package.)

[1]:Breiman, Friedman,分类和回归树",1984 年.

[1]: Breiman, Friedman, "Classification and regression trees", 1984.

这篇关于如何确定 RandomForestClassifier 中的 feature_importances?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆