R随机森林变量重要性 [英] R Random Forests Variable Importance

查看:1049
本文介绍了R随机森林变量重要性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用随机森林软件包在R中进行分类。

I am trying to use the random forests package for classification in R.

列出的可变重要性度量是:

The Variable Importance Measures listed are:


  • 对于类别0的变量x的平均原始重要性得分

  • 对于类别1的变量x的平均原始重要性得分

  • MeanDecreaseAccuracy

  • MeanDecreaseGini

  • mean raw importance score of variable x for class 0
  • mean raw importance score of variable x for class 1
  • MeanDecreaseAccuracy
  • MeanDecreaseGini

现在,我知道这些含义与我的定义一样。我想知道的是如何使用它们。

Now I know what these "mean" as in I know their definitions. What I want to know is how to use them.

我真正想知道的是,这些值仅在它们有多精确的情况下才意味着什么?好的值,什么是差的值,最大值和最小值是什么,等等。

What I really want to know is what these values mean in only the context of how accurate they are, what is a good value, what is a bad value, what are the maximums and minimums, etc.

如果变量的 MeanDecreaseAccuracy MeanDecreaseGini 表示它重要还是不重要?同样,有关原始分数的任何信息也可能有用。
我想知道所有与这些数字相关的数字。

If a variable has a high MeanDecreaseAccuracy or MeanDecreaseGini does that mean it is important or unimportant? Also any information on raw scores could be useful too. I want to know everything there is to know about these numbers that is relevant to the application of them.

使用错误,求和或置换一词的解释将比不涉及任何如何如何讨论的简单解释没有太大帮助。

An explanation that uses the words 'error', 'summation', or 'permutated' would be less helpful then a simpler explanation that didn't involve any discussion of how random forests works.

就像我想让别人向我解释如何使用收音机一样,我不希望这种解释涉及收音机如何将无线电波转换为无线电

Like if I wanted someone to explain to me how to use a radio, I wouldn't expect the explanation to involve how a radio converts radio waves into sound.

推荐答案


使用'error','summation'或'排列的
会没有那么有用,而更简单的解释不会涉及任何
关于随机森林如何工作的讨论。

An explanation that uses the words 'error', 'summation', or 'permutated' would be less helpful then a simpler explanation that didn't involve any discussion of how random forests works.

就像我想要的那样有人向我解释如何使用收音机,我不会
期望该解释涉及收音机如何将无线电波转换为声音。

Like if I wanted someone to explain to me how to use a radio, I wouldn't expect the explanation to involve how a radio converts radio waves into sound.

您如何解释WKRP 100.5 FM中的平均值中的数字,而不必关注烦人的波频率技术细节?坦率地说,即使您了解一些技术术语,Random Forests的参数和相关的性能问题也很难引起您的注意。

How would you explain what the numbers in WKRP 100.5 FM "mean" without going into the pesky technical details of wave frequencies? Frankly parameters and related performance issues with Random Forests are difficult to get your head around even if you understand some technical terms.

以下是我的一些回答:


-类0的变量x的平均原始重要性得分

-mean raw importance score of variable x for class 0

-的原始原始重要性得分类别1的变量x

-mean raw importance score of variable x for class 1

从随机森林简化网页,原始重要性得分衡量的是,对于成功地对数据进行成功的分类,比随机预测变量更有用。

Simplifying from the Random Forest web page, raw importance score measures how much more helpful than random a particular predictor variable is in successfully classifying data.


-MeanDecreaseAccuracy

-MeanDecreaseAccuracy

我认为这只是在 R模块,我相信它可以衡量该预测变量在多少模型降低了c lassification错误。

I think this is only in the R module, and I believe it measures how much inclusion of this predictor in the model reduces classification error.


-MeanDecreaseGini

-MeanDecreaseGini

基尼在用于描述社会的收入分配或度量时被定义为不平等。基于树的分类中的节点杂质。基尼系数较低(即基尼系数降低程度较高)意味着特定的预测变量在将数据划分为定义的类中起更大的作用。如果不谈论分类树中的数据是根据预测变量的值在各个节点进行拆分的事实,这很难描述。对于如何将其转化为更好的效果,我还不太清楚。

Gini is defined as "inequity" when used in describing a society's distribution of income, or a measure of "node impurity" in tree-based classification. A low Gini (i.e. higher descrease in Gini) means that a particular predictor variable plays a greater role in partitioning the data into the defined classes. It's a hard one to describe without talking about the fact that data in classification trees are split at individual nodes based on values of predictors. I'm not so clear on how this translates into better performance.

这篇关于R随机森林变量重要性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆