randomForests 包中的 LocalImp 参数究竟有什么作用? [英] What exactly does the `LocalImp` parameter do in randomForests package?
问题描述
谁能用比较简单的英文解释一下randomForest
包中参数localImp
的作用?
Can anyone explain, in relatively simple English, what the parameter localImp
does in the randomForest
package?
randomForest
文档将此参数描述为:
The randomForest
documentation describes this parameter as:
是否应该计算个案重要性度量?(将此设置为 TRUE将覆盖重要性.)
should casewise importance measure be computed? (Setting this to TRUE will override importance.)
它还声明它产生:
一个 p × n 矩阵,包含个案重要性度量,[i,j]其中元素是第 i 个变量在第 j 个情况下的重要性.如果 localImp=FALSE
a p by n matrix containing the casewise importance measures, the [i,j] element of which is the importance of i-th variable on the j-th case. NULL if localImp=FALSE
有人能准确解释一下这是什么意思吗,或者给我指明他们详细讨论这个参数的论文的方向.
Can someone explain exactly what this means or point me in the direction of a paper where they discuss this parameter in detail.
谢谢
推荐答案
randomForest 包或多或少是 Leo Breiman 和 Adel Cutler 编写的 fortran 代码的包装器.Breiman 是加州大学伯克利分校的统计学教授,在他去世后,他们保留了他的网站.
The randomForest package is more or less a wrapper for fortran code written by Leo Breiman and Adel Cutler. Breiman was a statistics professor at UC Berkeley and they have preserved his website after his passing.
这是一个了不起的资源:
https://www.stat.berkeley.edu/~breiman/RandomForests/
It is an amazing resource:
https://www.stat.berkeley.edu/~breiman/RandomForests/
在这个网站上,他们在分类页面上提到了以下内容:
In this site, they mention the following on the classification page:
对于每种情况,考虑它是 oob 的所有树.从未触及的 oob 数据中正确类的投票百分比中减去可变 m 排列 oob 数据中正确类的投票百分比.这是这种情况下变量 m 的局部重要性得分.
For each case, consider all the trees for which it is oob. Subtract the percentage of votes for the correct class in the variable-m-permuted oob data from the percentage of votes for the correct class in the untouched oob data. This is the local importance score for variable m for this case.
因此,对于观察 i,取所有没有在 i 上训练的树,因为它没有在引导程序中选择.现在,考虑变量 m.对不包含 i 的每棵树的遗漏 (oob) 观察值置换 m 的值.计算这些树的平均袋外准确率.还要计算这些树的袋外精度,而无需置换变量 m 的值.从非置换的 oob 准确度中减去置换后的 m 准确度的平均值得到 (i,m) 局部重要性度量.
So, for observation i, take all of the trees that did not train on i because it was not selected in the bootstrap. Now, consider variable m. Permute the values of m for the left out (oob) observations of every tree not containing i. Calculate the average out-of-bag accuracy across these trees. Also calculate the out-of-bag accuracy for these trees without permuting the values of variable m. Subtracting the average of the permuted m accuracy from the non-permuted oob accuracy gives the (i,m) local importance measure.
这篇关于randomForests 包中的 LocalImp 参数究竟有什么作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!