使用 R 中的 glmulti 包对 akaike 权重进行详尽的搜索多元回归 [英] Using the glmulti package in R for exhaustive search multiple regression for akaike weights

查看:79
本文介绍了使用 R 中的 glmulti 包对 akaike 权重进行详尽的搜索多元回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有人可以帮助我理解为什么当我在 R 中输入脚本时收到错误消息.对于一些背景信息,我正在研究 6 个不同的变量(我我的环境科学荣誉项目认为是 63 个组合或模型)(X) 在不同空间尺度上分别对初级和净生态系统总产量(Y)进行了分析.我决定使用带有 akaikes 信息准则 (AIC) 的详尽搜索多元回归分析来尝试找到一组最适合的模型.(以及分层划分以比较归因于不同 X 变量的方差)我想获得权重,以便我可以对哪些模型最符合"标准进行排名,看看是否有一个或一组它们装备其余的,因此是更多可能符合数据.

I was wondering if someone could help me understand why I am getting an error message when I enter a script into R. For abit of background information I am looking into the effect 6 different variables (which I think is 63 combinations or models) (X) have on gross primary and net ecosystem production (Y) seperatly at different spatial scales for my environmental science honours project. I have decided to use exhaustive search multiple regression analysis with akaikes information criterion (AIC) to try and find a group of models for best fit. (and hierarchical partitioning to compare variance attributed to different X variables) I want to get the weights so I can rank which models "best meet" the criterion see if there is one or a group of them that outfit the rest and therefore be a more likely fit to the data.

我最近在 Cross Validated 上的 hier.part 包上发布了一个类似的问题,得到了很好的回答,并被告知如果我将来有任何类似问题,请来这里.

I recently posted a similar question on the hier.part package on Cross Validated received a great answer and was told to come here if I had any similar questions in the future.

我用于 R 的软件包是 glmulti.可以在这里找到

The package I am using for R is glmulti. which can be found here

我使用的脚本是这个

require(glmulti)
GPPANDDRIVER<-read.table("C:\\Databases at different scales for R\\River Rhine and Netherlands\\GPP and drivers rhineland (comma delimited).csv",header=T,sep=",")
GPP<-GPPANDDRIVER$GPP
IND_VARS<-subset(GPPANDDRIVER,select=-GPP)
#  glmulti S4 generic 
glmulti(y=GPP, xr=IND_VARS, data, exclude = c(), name = "glmulti.analysis", intercept = TRUE, marginality = FALSE, bunch=30, chunk = 1, chunks = 1,
level = 2, minsize = 0, maxsize = -1, minK = 0, maxK = -1, method = "h", crit = "aic", confsetsize = 63, popsize = 40, mutrate = 10^-3, sexrate = 0.1, imm = 0.3, plotty = TRUE, report = TRUE, deltaM = 0.05, deltaB = 0.05, conseq = 5, fitfunction = "glm", resumefile = "id", includeobjects=TRUE,)

这是示例中提到的莱茵兰地区站点的 .csv 数据的链接,http://www.filedropper.com/gppanddriversrhinelandcommadelimited

Here is the link for the .csv data for sites in the rhineland mentioned in the example, http://www.filedropper.com/gppanddriversrhinelandcommadelimited

我对 R 非常陌生,所以我假设 popsize 表示重复的数量,对于这个规模是 40,所以我使用了 40,我还假设 confsetsize 表示可能的模型数量,我认为由于 6 个变量,我认为有 63 个?

I am extremely new to R so I presumed popsize means the number of replicates which is 40 for this scale so I used 40, I also assumed confsetsize meant number of possible models which I believe is 63 due to the 6 variables?

如果有人能帮忙,将不胜感激

If anyone could help it would be greatly appreciated

感谢您对基本问题的耐心和道歉

Thanks for you patience and apologies for the basic question

理查德

edit 我今天早上刚尝试运行脚本,现在 R 崩溃了.

edit I just tried running the script this morning and it now crashes R.

推荐答案

这对我有用.我认为主要是不要盲目地在模型调用中包含所有参数.其中大多数都有默认值,因此(如果包编写者已经完成了他们的工作)你应该能够保持它们原样而不用太担心(当然你应该 RTFM 和(尝试) 明白他们的意思...)

This worked for me. I think the main thing is not to blindly include all the parameters in the model call. Most of these have default values, thus (if the package writer has done their job) you should be able to leave them as they are and not worry too much (although of course you should RTFM and (try to) understand what they mean ...)

dat <- read.csv("GPPdriversRhineland.csv")
library(glmulti)

我决定用较短的标签重命名预测变量:

I decided to rename the predictors with shorter tags:

prednames <- c("NDVI","solar.rad","avg.temp","precip",
                "nutr.avail","water.cap")
names(dat)[1:6] <- prednames

这是拟合所有主效应组合所需的全部内容:由于您有六个预测变量,因此有 64 个级别 1 模型(包括空模型).

This is all you need to fit all combinations of main effects: since you have six predictors, there are 64 level-1 models (including the null model).

g1 <- glmulti("GPP",xr=prednames,data=dat,level=1)

对于更大的计算挑战:

g2 <- glmulti("GPP",xr=prednames,data=dat,level=2)

我相信这里有 2^(choose(6,2)+6) = 210 万个可能的模型.我还没有仔细研究 ?glmulti 来告诉它如何停止拟合模型.我刚刚开始(到目前为止它已经评估了 66,000 个模型),但它发现了一个 AIC 约为 500.5 的 2 级模型,这比集合中的 min-AIC 为 518 好很多1 级模型...

I believe there are 2^(choose(6,2)+6) = 2.1 million possible models here. I haven't looked at ?glmulti closely enough to tell it how to stop fitting models. I just started it off (so far it has evaluated 66,000 models), but it has found a 2-level model with AIC about 500.5, which is much better than the min-AIC of 518 in the set of 1-level models ...

PS 我更多地尝试设置,尝试遗传算法方法而不是穷举方法(我没有看到一个明显的方法来告诉 glmulti"使用详尽的方法,但在 N 次尝试后停止").即使使用比默认遗传算法稍微宽松的设置,它似乎仍停留在 AIC 大约 504,高于我首先尝试的(部分)详尽筛选中发现的值.

PS I played around with settings a bit more, trying the genetic algorithm approach rather than the exhaustive approach (I don't see an obvious way to tell glmulti "use the exhaustive approach, but stop after N tries"). Even with slightly more permissive-than-default genetic algorithm settings, it seems to get stuck at AIC approx 504, above the value found in the (partial) exhaustive screening I tried first.

例如:

g2 <- glmulti("GPP",xr=prednames,data=dat,level=2,marginality=TRUE,
              method="g",conseq=25,popsize=500,mutrate=1e-2)

PPS:我在穷举情况下获得更好结果的原因是我有 marginality=FALSE,即允许模型忽略主效应参数参与模型中包含的交互.这不一定是明智的.如果我关闭边缘性约束,那么遗传算法可以轻松地下降到 AIC=499 ......

PPS: the reason I was getting better results in the exhaustive case was that I had marginality=FALSE, i.e. the model was allowed to leave out main-effect parameters that were involved in interactions included in the model. This isn't necessarily sensible. If I turn off the marginality constraint, then the genetic algorithm can get down to AIC=499 without too much trouble ...

glmulti("GPP",xr=prednames,data=dat,level=2,marginality=TRUE,
              method="d")

也很有用:它打印出为给定规范定义的候选模型的数量.

is also useful: it prints out the number of candidate models defined for a given specification.

这篇关于使用 R 中的 glmulti 包对 akaike 权重进行详尽的搜索多元回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆