lme4的混合模型起始值 [英] Mixed model starting values for lme4

查看:195
本文介绍了lme4的混合模型起始值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用lme4包中的lmer函数拟合混合模型.但是,我不明白应该向start参数输入什么. 我的目的是使用简单的线性回归来将在那里估计的系数用作混合模型的起始值.

I am trying to fit a mixed model using the lmer function from the lme4 package. However, I do not understand what should be input to the start parameter. My purpose is to use a simple linear regression to use the coefficients estimated there as starting values to the mixed model.

让我们说我的模型如下:

Lets say that my model is the following:

linear_model = lm(y ~ x1 + x2 + x3, data = data)
coef = summary(linear_model)$coefficients[- 1, 1] #I remove the intercept
result = lmer(y ~ x1 + x2 + x3 | x1 + x2 + x3, data = data, start = coef)

此示例是我正在做的事情的过度简化版本,因为我将无法共享数据.

This example is an oversimplified version of what I am doing since I won't be able to share my data.

然后我得到以下错误:

Error during wrapup: incorrect number of theta components (!=105) #105 is the value I get from the real regression I am trying to fit.

我尝试了许多不同的解决方案,试图提供一个列表并为这些值命名theta,就像我在某些论坛上看到的那样.

I have tried many different solutions, trying to provide a list and name those values theta like I saw suggested on some forums.

Github代码还测试了长度是否合适,但我找不到它所指的内容:

Also the Github code test whether the length is appropriate but I cant find to what it refers to:

# Assign the start value to theta
if (is.numeric(start)) {
        theta <- start
}

# Check the length of theta
length(theta)!=length(pred$theta)

但是我找不到pred$theta的定义位置,所以我不明白105的值是从哪里来的.

However I can't find where pred$theta is defined and so I don't understand where that value 105 is coming from.

有帮助吗?

推荐答案

几点:

    实际上,
  • lmer并不明显适合任何固定效果系数;对它们进行概要分析,以便在非线性估计过程的每个步骤中隐式求解它们.该估计仅涉及方差-协方差参数的非线性搜索. lme4 vignettes (式30-31,第15页).因此,不可能为固定效果系数提供起始值,而且没有用...
  • 如果nAGQ>0 ... ,
  • glmer 会明确拟合固定系数作为非线性优化的一部分(如@ G.Grothendieck在评论中讨论的那样).
  • 显然比较模糊,但是theta参数(唯一在lmer拟合中显式优化的参数)的起始值对于Cholesky因子的非对角元素为0,对角元素为1 :这是在此处
  • 进行编码的
  • lmer doesn't in fact fit any of the fixed-effect coefficients explicitly; these are profiled out so that they are solved for implicitly at each step of the nonlinear estimation process. The estimation involves only a nonlinear search over the variance-covariance parameters. This is detailed (rather technically) in one of the lme4 vignettes (eqs. 30-31, p. 15). Thus providing starting values for the fixed-effect coefficients is impossible, and useless ...
  • glmer does fit fixed-effects coefficients explicitly as part of the nonlinear optimization (as @G.Grothendieck discusses in comments), if nAGQ>0 ...
  • it's admittedly rather obscure, but the starting values for the theta parameters (the only ones that are explicitly optimized in lmer fits) are 0 for the off-diagonal elements of the Cholesky factor, 1 for the diagonal elements: this is coded here
   ll$theta[] <- is.finite(ll$lower) # initial values of theta are 0 off-diagonal, 1 on

...,您需要进一步了解在上游已经对lower向量的值进行了编码,以使与对角元素相对应的theta向量的元素的下界为0,即非对角线元素的下界为-Inf;这等效于以缩放的方差-协方差矩阵的恒等矩阵(即,随机效应参数的方差-协方差矩阵除以残​​差)或随机效应方差开始-s方差矩阵为(sigma ^ 2 I).

... where you need to know further that, upstream, the values of the lower vector have been coded so that elements of the theta vector corresponding to diagonal elements have a lower bound of 0, off-diagonal elements have a lower bound of -Inf; this is equivalent to starting with an identity matrix for the scaled variance-covariance matrix (i.e., the variance-covariance matrix of the random-effects parameters divided by the residual variance), or a random-effects variance-covariance matrix of (sigma^2 I).

如果您有几种随机效果,并且每种都有较大的方差-协方差矩阵,则事情可能会变得有些毛茸茸.如果要恢复lmer默认使用的起始值,则可以按以下方式使用lFormula():

If you have several random effects and big variance-covariance matrices for each, things can get a little hairy. If you want to recover the starting values that lmer will use by default you can use lFormula() as follows:

library(lme4)
ff <- lFormula(Reaction~Days+(Days|Subject),sleepstudy)
(lwr <- ff$reTrms$lower)
## [1]    0 -Inf    0
ifelse(lwr==0,1,0)  ## starting values
## [1] 1 0 1

对于此模型,我们有一个2x2随机效应方差-协方差矩阵. theta参数按列顺序对应于此矩阵的下三角Cholesky因子,因此第一个和第三个元素是对角线,第二个元素是非对角线.

For this model, we have a single 2x2 random-effects variance-covariance matrix. The theta parameters correspond to the lower-triangle Cholesky factor of this matrix, in column-wise order, so the first and third elements are diagonal, and the second element is off-diagonal.

  • 您拥有105个theta参数的事实令我感到担忧;拟合如此大的随机效应模型将非常缓慢,并且需要大量数据才能可靠地拟合. (如果您知道模型是有道理的,并且您有足够的数据,则可能需要查看更快的选项,例如使用Doug Bates的MixedModels包用于Julia或可能 glmmTMB,对于较大的theta向量问题,它的缩放比例可能会好于lme4 ...)
  • 您的模型公式y ~ x1 + x2 + x3 | x1 + x2 + x3似乎很奇怪.我无法弄清楚将相同变量作为随机效应项并在同一模型中对变量进行分组的任何上下文!
  • The fact that you have 105 theta parameters worries me; fitting such a large random-effects model will be extremely slow and take an enormous amount of data to fit reliably. (If you know your model makes sense and you have enough data you might want to look into faster options, such as using Doug Bates's MixedModels package for Julia or possibly glmmTMB, which might scale better than lme4 for problems with large theta vectors ...)
  • your model formula, y ~ x1 + x2 + x3 | x1 + x2 + x3, seems very odd. I can't figure out any context in which it would make sense to have the same variables as random-effect terms and grouping variables in the same model!

这篇关于lme4的混合模型起始值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆