randomForests 包中的 LocalImp 参数究竟有什么作用? [英] What exactly does the `LocalImp` parameter do in randomForests package?

查看:69
本文介绍了randomForests 包中的 LocalImp 参数究竟有什么作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能用比较简单的英文解释一下randomForest包中参数localImp的作用?

Can anyone explain, in relatively simple English, what the parameter localImp does in the randomForest package?

randomForest 文档将此参数描述为:

The randomForest documentation describes this parameter as:

是否应该计算个案重要性度量?(将此设置为 TRUE将覆盖重要性.)

should casewise importance measure be computed? (Setting this to TRUE will override importance.)

它还声明它产生:

一个 p × n 矩阵,包含个案重要性度量,[i,j]其中元素是第 i 个变量在第 j 个情况下的重要性.如果 localImp=FALSE

a p by n matrix containing the casewise importance measures, the [i,j] element of which is the importance of i-th variable on the j-th case. NULL if localImp=FALSE

有人能准确解释一下这是什么意思吗,或者给我指明他们详细讨论这个参数的论文的方向.

Can someone explain exactly what this means or point me in the direction of a paper where they discuss this parameter in detail.

谢谢

推荐答案

randomForest 包或多或少是 Leo Breiman 和 Adel Cutler 编写的 fortran 代码的包装器.Breiman 是加州大学伯克利分校的统计学教授,在他去世后,他们保留了他的网站.

The randomForest package is more or less a wrapper for fortran code written by Leo Breiman and Adel Cutler. Breiman was a statistics professor at UC Berkeley and they have preserved his website after his passing.

这是一个了不起的资源:
https://www.stat.berkeley.edu/~breiman/RandomForests/

It is an amazing resource:
https://www.stat.berkeley.edu/~breiman/RandomForests/

在这个网站上,他们在分类页面上提到了以下内容:

In this site, they mention the following on the classification page:

对于每种情况,考虑它是 oob 的所有树.从未触及的 oob 数据中正确类的投票百分比中减去可变 m 排列 oob 数据中正确类的投票百分比.这是这种情况下变量 m 的局部重要性得分.

For each case, consider all the trees for which it is oob. Subtract the percentage of votes for the correct class in the variable-m-permuted oob data from the percentage of votes for the correct class in the untouched oob data. This is the local importance score for variable m for this case.

因此,对于观察 i,取所有没有在 i 上训练的树,因为它没有在引导程序中选择.现在,考虑变量 m.对不包含 i 的每棵树的遗漏 (oob) 观察值置换 m 的值.计算这些树的平均袋外准确率.还要计算这些树的袋外精度,而无需置换变量 m 的值.从非置换的 oob 准确度中减去置换后的 m 准确度的平均值得到 (i,m) 局部重要性度量.

So, for observation i, take all of the trees that did not train on i because it was not selected in the bootstrap. Now, consider variable m. Permute the values of m for the left out (oob) observations of every tree not containing i. Calculate the average out-of-bag accuracy across these trees. Also calculate the out-of-bag accuracy for these trees without permuting the values of variable m. Subtracting the average of the permuted m accuracy from the non-permuted oob accuracy gives the (i,m) local importance measure.

这篇关于randomForests 包中的 LocalImp 参数究竟有什么作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆