gbm :: interact.gbm与dismo :: gbm.interactions [英] gbm::interact.gbm vs. dismo::gbm.interactions

查看:226
本文介绍了gbm :: interact.gbm与dismo :: gbm.interactions的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

gbm package的参考手册指出interact.gbm函数计算弗里德曼的H统计量,以评估变量相互作用的强度. H统计量的范围为[0-1].

The reference manual for the gbm package states the interact.gbm function computes Friedman's H-statistic to assess the strength of variable interactions. the H-statistic is on the scale of [0-1].

dismo package的参考手册未引用任何有关gbm.interactions函数如何检测和建模交互的文献.相反,它提供了用于检测和建模交互的常规过程列表. dismo小插图用于生态建模的增强回归树"指出,dismo程序包扩展了gbm程序包中的功能.

The reference manual for the dismo package does not reference any literature for how the gbm.interactions function detects and models interactions. Instead it gives a list of general procedures used to detect and model interactions. The dismo vignette "Boosted Regression Trees for ecological modeling" states that the dismo package extends functions in the gbm package.

问题

dismo::gbm.interactions如何真正检测和建模交互?

How does dismo::gbm.interactions actually detect and model interactions?

为什么

我问这个问题是因为dismo package中的gbm.interactions得出的结果> 1,gbm package参考手册说不可能.

I am asking this question because gbm.interactions in the dismo package yields results >1, which the gbm package reference manual says is not possible.

我检查了每个软件包的tar.gz,以查看源代码是否相似.完全不同,我无法确定这两个程序包是否使用相同的方法来检测和建模交互.

I checked the tar.gz for each of the packages to see if the source code was similar. It is different enough that I cannot determine if these two packages are using the same method to detect and model interactions.

推荐答案

总而言之,两种方法之间的差异归结为如何估算两个预测变量的部分依赖函数".

To summarize, the difference between the two approaches boils down to how the "partial dependence function" of the two predictors is estimated.

dismo程序包基于

The dismo package is based on code originally given in Elith et al., 2008 and you can find the original source in the supplementary material. The paper very briefly describes the procedure. Basically the model predictions are obtained over a grid of two predictors, setting all other predictors at their means. The model predictions are then regressed onto the grid. The mean squared errors of this model are then multiplied by 1000. This statistic indicates departures of the model predictions from a linear combination of the predictors, indicating a possible interaction.

dismo包中,我们还可以获取gbm.interactions的相关源代码.交互测试可归结为以下命令(直接从源代码复制):

From the dismo package, we can also obtain the relevant source code for gbm.interactions. The interaction test boils down to the following commands (copied directly from source):

interaction.test.model <- lm(prediction ~ as.factor(pred.frame[,1]) + as.factor(pred.frame[,2]))

interaction.flag <- round(mean(resid(interaction.test.model)^2) * 1000,2)

pred.frame包含所讨论的两个预测变量的网格,而prediction是来自原始gbm拟合模型的预测,其中,除两个正在考虑的预测变量外,其他所有预测变量均已设置为均值.

pred.frame contains a grid of the two predictors in question, and prediction is the prediction from the original gbm fitted model where all but two predictors under consideration are set at their means.

这不同于Friedman的H统计量(Friedman& Popescue,2005年)通过公式(44)对任意一对预测变量进行估算.本质上,这是任何两个预测变量对其他变量的值求平均的与可加性的偏离,而不是用其他手段设置其他变量.它表示为两个变量(或模型隐含预测)的部分依赖函数的总方差的百分比,因此将始终在0-1之间.

This is different than Friedman's H statistic (Friedman & Popescue, 2005), which is estimated via formula (44) for any pair of predictors. This is essentially the departure from additivity for any two predictors averaging over the values of the other variables, NOT setting the other variables at their means. It is expressed as a percent of the total variance of the partial dependence function of the two variables (or model implied predictions) so will always be between 0-1.

这篇关于gbm :: interact.gbm与dismo :: gbm.interactions的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆