加快R中的LMER功能 [英] Speed up lmer function in R

查看:161
本文介绍了加快R中的LMER功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在尝试使用lme4包尝试改善R中的线性混合效果模型的模型拟合时间时,我想分享一些想法.

I would like to share some of my thoughts when trying to improve the model fitting time of a linear mixed effects model in R using the lme4 package.

数据集大小:数据集大约由400.000行和32列组成.不幸的是,无法共享有关数据性质的信息.

Dataset Size: The dataset consists, approximately, of 400.000 rows and 32 columns. Unfortunately, no information can be shared about the nature of the data.

假设和检查:假定响应变量来自正态分布.在进行模型拟合之前,使用相关表和R中提供的alias函数对变量的共线性和多重共线性进行了测试.

Assumptions and Checks: It is assumed that the response variable comes from a Normal distribution. Prior to the model fitting process, variables were tested for collinearity and multicollinearity using correlation tables and the alias function provided in R.

对连续变量进行缩放以帮助收敛.

Continuous variables were scaled in order to help convergence.

模型结构:模型方程式包含31种固定效果(包括拦截)和30种随机效果(不包括拦截).对于具有2700个水平的特定因子变量,随机效应是随机的.协方差结构是方差分量,因为它假定随机效应之间存在独立性.

Model Structure: The model equation contains 31 fixed effects (including intercept) and 30 random effects (intercept is not included). Random effects are randomized for a specific factor variable that has 2700 levels. The covariance structure is Variance Components as it is assumed that there is independency between random effects.

模型方程式示例:

lmer(Response ~ 1 + Var1 + Var2 + ... + Var30 + (Var1-1| Group) + (Var2-1| Group) + ... + (Var30-1| Group), data=data, REML=TRUE)

lmer(Response ~ 1 + Var1 + Var2 + ... + Var30 + (Var1-1| Group) + (Var2-1| Group) + ... + (Var30-1| Group), data=data, REML=TRUE)

模型已成功拟合,但是花费了大约3.1个小时来提供结果. SAS中的同一模型花费了几秒钟.网上有文献介绍如何通过使用非线性优化算法nloptwrap来减少时间并关闭在优化完成后执行的耗时的导数计算calc.derivs = FALSE:

Model was fitted successfully, however, it took about 3,1 hours to provide results. The same model in SAS took a few seconds. There is available literature on the web on how to reduce time by using the non-linear optimization algorithm nloptwrap and turnining off the time consuming derivative calculation that is performed after the optmization is finished calc.derivs = FALSE:

https://cran.r-project.org/web/packages/lme4/vignettes/lmerperf.html

时间减少了78%.

问题:是否还有其他方法可以通过相应地定义lmer参数输入来减少模型拟合时间?在模型拟合时间方面,R和SAS之间有很大差异.

Question: Is there any other alternative way to reduce the model fitting time by defining the lmer parameter inputs accordingly? There is so much difference between R and SAS in terms of model fitting time.

任何建议都值得赞赏.

推荐答案

lmer()通过针对随机效应的协方差矩阵中的参数优化剖析对数似然法或剖析REML准则来确定参数估计.在您的示例中,将有31个这样的参数,对应于31个项中每个项的随机效应的标准偏差.如此大小的受限优化需要时间.

lmer() determines the parameter estimates by optimizing the profiled log-likehood or profiled REML criterion with respect to the parameters in the covariance matrix of the random effects. In your example there will be 31 such parameters, corresponding to the standard deviations of the random effects from each of the 31 terms. Constrained optimizations of that size take time.

SAS PROC MIXED可能具有特定的优化方法,或者具有确定初始估算值的更复杂的方法. SAS是一个封闭源系统,这意味着我们将不知道它们的作用.

It is possible that SAS PROC MIXED has specific optimization methods or has more sophisticated ways of determining starting estimates. SAS being a closed-source system means we won't know what they do.

通过这种方式,您可以将随机效果写为(1 + Var1 + Var2 + ... + Var30 || Group)

By the way, you can write the random effects as (1+Var1+Var2+...+Var30||Group)

这篇关于加快R中的LMER功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆