glmulti超大候选集 [英] glmulti Oversized candidate set

查看:216
本文介绍了glmulti超大候选集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

错误消息:

系统:win7/64bit/ultimate/16gb-real-ram加上虚拟内存, memory.limit(32000)

SYSTEM: win7/64bit/ultimate/16gb-real-ram plus virtual memory, memory.limit(32000)

  1. 此错误消息是什么意思?

    在glmulti(y ="y",data = mydf,xr = c("x1",::!超大候选集.

  1. What does this error message mean?

    In glmulti(y = "y", data = mydf, xr = c("x1", : !Oversized candidate set.

mydf具有3.6mm行和150列浮子

mydf has 3.6mm rows & 150 columns of floats

R/64位"Good Sport"

R/64bit "Good Sport"

推荐答案

我遇到了同样的问题,到目前为止,这是我发现的问题:

I have encountered the same problem, here is what I have found out so far:

  1. 行数似乎不是问题.问题在于,使用150个预测变量时,程序包无法处理详尽的搜索(即查看并比较所有可能的模型).根据我的经验,您还允许成对交互(level=2,设置level=1以禁止交互),会触发您的特定错误消息"Oversized Candidate Set".然后,您很可能会遇到警告消息预测变量过多".在我的实验(非常有限)中,我发现要进入候选集的模型的最大数量约为十亿个模型(具体而言:30个协变量等于1,073,741,824,基于2 ^ n来计算可能的组合(n = 30 ).).这是我用来评估此代码的代码

  1. The number of rows does not seem to be the issue. The issue is that with 150 predictors the package can't handle an exhaustive search (that is take a look and compare all possible models). From my experience your specific error message "Oversized Candidate Set", is triggered by the fact that you also allow for pairwise interactions (level=2, set level=1 to prohibit interactions). Then you will most likely run into a warning message "Too many predictors". In my (very limited) experimentation, I found that the maximum amount of models I got to work into the candidate set was about a billion models (specifically: 30 covariates equal 1,073,741,824 based on the 2^n to calculate possible combinations (n=30).). Here is the code I used to evaluate this

out <integer(50) for(i in 2:40) out[i]<-glmulti(names(data)[1], names(data)[2:i], method="d", level=1, crit=aic, data=data)

out <integer(50) for(i in 2:40) out[i]<-glmulti(names(data)[1], names(data)[2:i], method="d", level=1, crit=aic, data=data)

一旦循环命中31,候选集返回的变量就会与0个模型产生协变量. 33及更高版本开始返回警告消息.我的数据"大约有100个变量,大约有1000行,但是就像我说的,问题是数据集的宽度而不是深度.

once the loop hits 31 covariates the candidate set returns with 0 models. 33 and later it starts returning the warning message. My "data" had about 100 variables and just around a 1000 rows, but like I said the problem is the width of the dataset not the depth.

就像我说的那样,首先要消除相互作用,然后考虑首先使用其他变量减少技术来减少变量数量(因子分析/原理组成或聚类).这些问题将失去一些可解释性,但保持预测能力.

Like I said, start by eliminating the interactions, then consider using other variable reduction techniques first to get your variable number down (factor analysis/principle components or clustering). The issue with those is will lose some explainability, but keep predictive power.

glmuttil文档将包与替代方案,同时强调其用例,好处和缺点.

The glmuttil documentation compares the package with alternatives, while highlighting their use cases, benefits and downfalls.

PS:我在Win7(64位,16GB Ram,R版本)上运行我的东西:3.10 glmutil 1.07. PPS:据说该软件包的作者去年发布了2.0版,可以解决其中的一些问题.在来源

PS: I ran my stuff on Win7, 64 bit, 16GB Ram, R version: 3.10 glmutil 1.07. PPS: The author of the package was said to release version 2.0 last year that would fix some of these issues. Read more at the source

这篇关于glmulti超大候选集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆