具有固定和随机效应的库(lme4)中的MLM设计矩阵 [英] Design matrix for MLM from library(lme4) with fixed and random effects

查看:130
本文介绍了具有固定和随机效应的库(lme4)中的MLM设计矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

应用环境
我有一个具有随机斜率和截距的模型.有许多级别的随机效果.新数据(待预测)可能具有或不具有所有这些级别.

Context of application
I have a model with random slopes and intercepts. There are numerous levels of the random effects. The new data (to be predicted) may or may not have all of these levels.

为了更具体一点,我正在专辑级别(title)上处理音乐收入.每个专辑可能有多种类型format2(CD,黑胶唱片,电子音频等).我对每种专辑在每种专辑中的收入都进行了度量.该模型指定为:

To make this more concrete, I am working with music revenue at the album level (title). Each album may come in multiple types format2 (CD, vinyl, e-audio, etc). I have measurements for revenue for each album at each type of album. The model is specified as:

lmer(physical~ format2+ (0+format2|title))

问题在于,将来的数据可能不具有titleformat2的所有级别.对于随机截距,可以使用predict(..., allow.new.levels= TRUE)轻松解决.但这对于固定效果和随机斜率是有问题的.因此,我试图编写一个函数来对merMod对象进行灵活的预测,类似于lme4::predict.merMod;但这将处理训练数据和预测数据之间的差异.这是一个由于对lme4::predict.merMod的确切细节一无所知而提出的问题.

The problem is that future data may not have all the levels of either title or format2. For random intercepts, this is easily resolved with predict(..., allow.new.levels= TRUE). But it is problematic for the fixed effects and random slopes. I am therefore trying to write a function to do flexible predictions of merMod objects, similar to lme4::predict.merMod; but that will handle the differences between the training data and the prediction data. This is a question asked as much out of ignorance to the exact details of lme4::predict.merMod as anything else.

问题描述
问题的症结在于获得具有固定和随机效应的正确model.matrix()来计算预测和SE.类merMod的S3方法仅返回固定效果.

Description of problem
The crux of the problem is getting the correct model.matrix() with fixed and random effects to calculate both predictions and SE's. The S3 method for class merMod returns only the fixed effects.

基本stats::model.matrix()函数的文档非常有限.不幸的是,我既不拥有 S中的统计模型用于数据分析的软件,这些功能背后的细节.

The base stats::model.matrix() function has very limited documentation. Unfortunately, I do not own either Statistical Models in S or Software for Data Analysis, which appear to have the details behind these functions.

model.matrix()应该采用模型公式和新的数据框,并产生一个设计矩阵.但是我遇到了一个错误.您提供的任何帮助将不胜感激.

model.matrix() is supposed to take a model formula and new data frame and produce a design matrix. But I'm getting an error. Any help you can provide would be much appreciated.

示例数据

dat1 <- structure(list(dt_scale = c(16, 16, 16, 16, 16, 16, 16, 16, 16, 
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16), title = c("Bahia", 
"Jazz Moods: Brazilian Romance", "Quintessence", "Amadeus: The Complete Soundtrack Recording (Bicentennial Edition)", 
"Live In Europe", "We'll Play The Blues For You", "The Complete Village Vanguard Recordings, 1961", 
"The Isaac Hayes Movement", "Jazz Moods: Jazz At Week's End", 
"Blue In Green: The Concert In Canada", "The English Patient - Original Motion Picture Soundtrack", 
"The Unique Thelonious Monk", "Since We Met", "You're Gonna Hear From Me", 
"The Colors Of Latin Jazz: Cubop!", "The Colors Of Latin Jazz: Samba!", 
"Homecoming", "Consecration: The Final Recordings Part 2 - Live At Keystone Korner, September 1980", "More Creedence Gold", "The Stardust Session"), format2 = c("CD", "CD", 
"CD", "CD", "CD", "CD", "CD", "SuperAudio", "SuperAudio", "CD", "E Audio", "CD", 
"Vinyl", "CD", "E Audio", "CD", "CD", "CD", "CD", "CD"), mf_day = c(TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE), xmas = c(FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE), vday = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE), yr_since_rel = c(16.9050969937038, 
8.41815617876864, 9.2991404674865, 25.0870296783559, 39.1267038232812, 
27.9156764326061, 9.11596751812513, 23.3052837112449, 14.3123922258974, 
30.5208152866414, 5.83025071417496, 21.3090003877291, 7.75022155568392, 
11.3601605287827, 0.849006673421519, 31.9918631305662, 13.8861905547041, 
12.8342695062012, 29.6916671402534, 13.5912612705038), physical = c(1327.17849171096, 
-110.2265302258, -795.37376268564, 355.06192702004, -1357.3492884345, 
-1254.93442612023, -816.713683621225, 881.201935773452, -3092.02845691036, 
-2268.6304275652, 907.347941142021, -699.130275178185, 377.867849132077, 
-1047.50531157311, 1460.25978951805, 1376.84579069304, 3619.03629114089, 
962.888173535704, 2514.77880599199, 2539.14958588771)), .Names = c("dt_scale", 
"title", "format2", "mf_day", "xmas", "vday", "yr_since_rel", 
"physical"), row.names = c(1L, 2L, 5L, 6L, 7L, 8L, 9L, 11L, 12L, 
13L, 14L, 15L, 20L, 22L, 23L, 25L, 27L, 32L, 35L, 36L), class = "data.frame")

公式:

f1 <- as.formula(~1 + dt_scale + yr_since_rel + format2 + (0 + format2 + mf_day + 
xmas + vday | title))

执行/错误

library(lme4)
model.matrix(f1, data= dat1)
Error in 0 + format2 : non-numeric argument to binary operator

注意 我还尝试过使用Orthodont数据进行此操作;但是,我得到了另一个错误.

Note I've also tried this with the Orthodont data; but, I get a different error.

library(lme4)
data("Orthodont",package="MEMSS")
fm1 <- lmer(formula = distance ~ age*Sex + (1+age|Subject), data = Orthodont)
newdat <- expand.grid(
  age=c(8,10,12,14)
  , Sex=c("Male","Female")
  , distance = 0
  , Subject= c("F01", "F02")
)


f1 <- formula(fm1)[-2] # simpler code via Ben Bolker below
mm <- model.matrix(f1, newdat) # attempt to use model.matrix
Warning message
In Ops.factor(1 + age, Subject) : | not meaningful for factors

# use lme4:::mkNewReTrms as suggested in comments
mm <- lme4:::mkNewReTrms(f1, newdat) 
Error in lme4:::mkNewReTrms(f1, newdat) : object 'ReTrms' not found
In addition: Warning message:
In Ops.factor(1 + age, Subject) : | not meaningful for factors

# check if different syntax would fix this
mm <- lme4::mkNewReTrms(f1, newdat)
Error: 'mkNewReTrms' is not an exported object from 'namespace:lme4'
mm <- mkNewReTrms(f1, newdat)
Error: could not find function "mkNewReTrms"

推荐答案

15年8月12日编辑:请参见 GitHub存储库

已编辑,2014年10月15日:此答案尚不完美.仍然有几个带有错误的用例(请参阅下面的评论链).但它在大多数情况下都有效.我会在某个时候解决它.

Editted, 10/15/2014: This answer isn't yet perfect. There are still a couple of use-cases with errors (see comment chain below). But it works in most cases. I'll get around to finalizing it at some point.

我相信此功能将解决更重要的问题,即对merMod对象的准确预测. Bolker博士,这里仍然存在一些问题(例如稀疏性和效率).但我相信该方法有效:

I believe this function will solve the more important problem, accurate predictions for merMod objects. Dr Bolker, there are still some issues here (such as sparsity and efficiency); but I believe the method works:

data("Orthodont",package="MEMSS")
fm1 <- lmer(formula = distance ~ age*Sex + (1+age|Subject), data = Orthodont)
newdat <- expand.grid(
  age=c(8,10,12,14)
  , Sex=c("Male","Female")
  , distance = 0
  , Subject= c("F01", "F02")
)

predict.merMod2 <- function(object, newdat=NULL) {
# 01. get formula and build model matrix
  # current problem--model matrix is not sparse, as would be ideal
  f1 <- formula(object)[-2]
  z.fe <- model.matrix(terms(object), newdat)
  z.re <- t(lme4:::mkReTrms(findbars(f1), newdat)$Zt)
  mm <- cbind(z.fe, 
              matrix(z.re, nrow= dim(z.re)[1], ncol= dim(z.re)[2],
                     dimnames= dimnames(z.re)))

  # 02. extract random effect coefficients needed for the new data
  # (a) - determine number of coef
  len <- length(ranef(object)) 
  re.grp.len <- vector(mode= "integer", length= len) 
  for (i in 1:len) { # for each random group
    re.grp.len[i] <- dim(ranef(object)[[i]])[2] # number of columns (slope and intercept terms)
  }

  # (b) - create beta vector
  fe.names <- unique(colnames(mm)[1:length(fixef(object)) - 1])
  re.names <- unique(colnames(mm)[-c(1:length(fixef(object)) - 1)]) 
  beta.re <- as.vector(rep(NA, length= sum(re.grp.len) * length(re.names)), mode= "numeric")
  for (i in 1:len) {
    re.beta  <- ranef(object)[[i]][rownames(ranef(object)[[i]]) %in% re.names,] 
    ind.i <- sum(!is.na(beta.re)) + 1; ind.j <- length(as.vector(t(re.beta))) 
    beta.re[ind.i:ind.j]  <- as.vector(t(re.beta)) 
  }
  beta <- c(fixef(object)[names(fixef(object)) %in% fe.names], beta.re)
  # 03. execute prediction
  return(mm %*% beta)
}

predict.merMod2(fm1, newdat)

这篇关于具有固定和随机效应的库(lme4)中的MLM设计矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆