lme4 :: glmer vs.Stata的melogit命令 [英] lme4::glmer vs. Stata's melogit command

查看:629
本文介绍了lme4 :: glmer vs.Stata的melogit命令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我一直在尝试将许多随机效果模型拟合到相对较大的数据集.假设在最多25个时间点观察了大约50,000人(或更多).由于样本量如此之大,我们包含了许多我们要调整的预测因素-可能有50种左右的固定效果.我使用R中的lme4::glmer将模型拟合为二进制结果,并对每个主题随机截取.我无法详细说明数据,但是我使用的glmer命令的基本格式为:

Lately I have been trying to fit a lot of random effects models to relatively big datasets. Let’s say about 50,000 people (or more) observed at up to 25 time points. With such a large sample size, we include a lot of predictors that we’re adjusting for – maybe 50 or so fixed effects. I’m fitting the model to a binary outcome using lme4::glmer in R, with random intercepts for each subject. I can't go into specifics on the data, but the basic format of the glmer command I used was:

fit <-  glmer(outcome ~ treatment + study_quarter + dd_quarter + (1|id),
              family = "binomial", data = dat)

其中study_quarterdd_quarter都是因子,每个因子约有20个水平.

where both study_quarter and dd_quarter are factors with about 20 levels each.

当我尝试在R中拟合此模型时,它会运行约12-15小时,并返回无法收敛的错误.我做了很多故障排除(例如,遵循这些准则)没提升.而且收敛甚至还没有结束(最大梯度在5-10之间,而我认为收敛标准是0.001).

When I try to fit this model in R, it runs for about 12-15 hours, and returns an error that it failed to converge. I did a bunch of troubleshooting (e.g., following these guidelines), with no improvement. And the convergence isn’t even close in the end (max gradient around 5-10, whereas the convergence criterion is 0.001 I think).

然后我尝试使用melogit命令在Stata中拟合模型.该模型在2分钟内拟合完毕,没有收敛问题.相应的Stata命令是

I then tried fitting the model in Stata, using the melogit command. The model fit in under 2 mins, with no convergence issues. The corresponding Stata command is

melogit outcome treatment i.study_quarter i.dd_quarter || id:

有什么作用? Stata只是具有更好的拟合算法,还是针对大型模型和大型数据集进行了更好的优化?真令人惊讶,运行时间有多么不同.

What gives? Does Stata just have a better fitting algorithm, or one better optimized for large models and large datasets? It’s really surprising how different the run times were.

推荐答案

使用可选参数nAGQ=0Lglmer的拟合可能会快得多.您有许多固定效果参数(每个study_quarterdd_quarter的20级,总共产生28个对比度),默认的优化方法(对应于nAGQ=1L)将所有这些系数放入一般的非线性优化调用中.使用nAGQ=0L,这些系数可以在更快的惩罚迭代最小加权平方(PIRLS)算法中进行优化.从某种意义上来说,默认值通常可以提供更好的估计值,因为估计值的偏差较小,但通常差异很小,时间差也很大.

The glmer fit will probably be much faster with the optional argument nAGQ=0L. You have many fixed-effects parameters (20 levels for each of study_quarter and dd_quarter generate a total of 28 contrasts) and the default optimization method (corresponding to nAGQ=1L) puts all of those coefficients into the general nonlinear optimization call. With nAGQ=0L these coefficients are optimized within the much faster penalized iteratively reweighted least squares (PIRLS) algorithm. The default will generally provide a better estimate in the sense that the deviance at the estimate is lower, but the difference is usually very small and the time difference is enormous.

我以 Jupyter 笔记本 nAGQ.ipynb .该写法使用 MixedModels 包用于 lme4 ,但是方法是相似的. (我是lme4的作者之一,也是MixedModels的作者.)

I have a write-up of the differences in these algorithms as a Jupyter notebook nAGQ.ipynb. That writeup uses the MixedModels package for Julia instead of lme4 but the methods are similar. (I am one of the authors of lme4 and the author of MixedModels.)

如果您要进行很多GLMM拟合,我会考虑在Julia中使用MixedModels进行.即使使用lme4中的所有复杂代码,它通常也比R快得多.

If you are going to be doing a lot of GLMM fits I would consider doing so in Julia with MixedModels. It is often much faster than R, even with all the complicated code in lme4.

这篇关于lme4 :: glmer vs.Stata的melogit命令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆