包装过程中出错:glm()函数中尚不支持长向量 [英] Error during wrapup: long vectors not supported yet: in glm() function

查看:89
本文介绍了包装过程中出错:glm()函数中尚不支持长向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Stackoverflow 上发现了与此主题相关的几个问题(其中一些问题没有任何答案),但到目前为止(与该回归错误无关).

I found several questions on Stackoverflow regarding this topic (some of them without any answer) but nothing related (so far) with this error in regression.

我正在 r 中运行一个概率模型,并且(我猜测)有太多的固定影响(年份和位置):

I'm, running a probit model in r with (I'm guessing) too many fixed effects (year and places):

myprobit <- glm(factor(Y) ~ factor(T) + factor(X1) + factor(X2) + factor(X3) +
                 factor(YEAR) + factor(PLACE),
                 family = binomial(link = "probit"),
                 data = DT)

PLACE 变量具有大约1000个唯一值和 YEAR 8个值.数据集 DT 具有13,099,225 obs和79列.

The PLACE variable has about 1000 unique values and YEAR 8 values. The dataset DT has 13,099,225 obs and 79 columns.

我得到的错误是:

Error: cannot allocate vector of size 59.3 Gb
Error during wrapup: long vectors not supported yet: ../include/Rinlinedfuns.h:519

我正在使用的计算机具有128 GB的RAM.

The machine I'm using has 128 GB of RAM.

因此,不更改功能,我不知道该怎么办.有谁知道如何处理这个问题?谢谢!

So, I don't know what I can do, without change the function. Does anyone know how to deal with this issue? Thanks!

推荐答案

为了结束这个问题,我不得不提到@Axeman的答案是解决我的问题的唯一可行方法.整个问题是,没有足够的内存来管理如此庞大的设计矩阵.

In order to close this question, I have to mention that the @Axeman's answer it is the only approach feasible for my problem. The whole issue is, there is not enough memory to manage such a huge design matrix.

因此,使用 biglm 包运行概率回归,并且到目前为止,我发现的唯一解决方案是 bigglm()函数.

Therefore, run a probit regression using the biglm package and bigglm() function is the only solution I found so far.

尽管如此,我意识到,由于 biglm 包的工作方式,迭代地获取数据块,在RHS中使用 factor()变量使每个问题成问题块中未显示因子水平的时间.换句话说,如果一个因子变量具有5个级别,但是在数据块中仅出现4个级别,那么我的估计就会出错.

Nevertheless, I realize, due to how the biglm package works, taking iteratively chunks of the data, the use of factor() variables in the RHS it's problematic every time when factor level is not represented in the chunk. In other words, if a factor variable has 5 levels, but in the data chunk only 4 levels appear, I will have an error in the estimation.

Stackoverflow 上对此有一些疑问和评论.

There are several questions and comments about this on Stackoverflow.

这篇关于包装过程中出错:glm()函数中尚不支持长向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆