R中多重插补数据集的多级回归模型(Amelia,zelig,lme4) [英] Multi-level regression model on multiply imputed data set in R (Amelia, zelig, lme4)

查看:161
本文介绍了R中多重插补数据集的多级回归模型(Amelia,zelig,lme4)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对多个估算数据(由Amelia创建)运行多级模型;该样本基于群组= 24,N = 150的聚类样本.

I am trying to run a multi-level model on multiply imputed data (created with Amelia); the sample is based on a clustered sample with group = 24, N= 150.

library("ZeligMultilevel")
ML.model.0 <- zelig(dv~1 + tag(1|group), model="ls.mixed",
data=a.out$imputations)
summary(ML.model.0)

此代码产生以下错误代码:

This code produces the following error code:

Error in object[[1]]$result$call : 
$ operator not defined for this S4 class

如果我运行OLS回归,则可以:

If I run a OLS regression, it works:

model.0 <- zelig(dv~1, model="ls", data=a.out$imputations)
m.0 <- coef(summary(model.0)) 
print(m.0, digits = 2)

      Value Std. Error t-stat  p-value
[1,]    45       0.34    130 2.6e-285

我很高兴提供一个工作示例.

require(Zelig)
require(Amelia)
require(ZeligMultilevel)

data(freetrade)
length(freetrade$country) #grouping variable

#Imputation of missing data

a.out <- amelia(freetrade, m=5, ts="year", cs="country")

# Models: (1) OLS; (2) multi-level 

model.0 <- zelig(polity~1, model="ls", data=a.out$imputations)
m.0 <- coef(summary(model.0)) 
print(m.0, digits = 2)

ML.model.0 <- zelig(polity~1 + tag(1|country), model="ls.mixed", data=a.out$imputations)
summary(ML.model.0)

我认为问题可能在于Zelig如何与Amelia的mi类互动.因此,我转向了另一种R包:lme4.

I think the issue may be with how Zelig interfaces with Amelia's mi class. Therefore, I turned toward an alternative R package: lme4.

require(lme4)
write.amelia(obj=a.out, file.stem="inmi", format="csv", na="NA")
diff <-list(5)  # a list to store each model, 5 is the number of the imputed datasets

for (i in 1:5) {
file.name <- paste("inmi", 5 ,".csv",sep="")
data.to.use <- read.csv(file.name)
diff[[5]] <- lmer(polity ~ 1 + (1 | country),
data = data.to.use)}
diff

结果如下:

[[1]]
[1] 5

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

[[5]]
Linear mixed model fit by REML 
Formula: polity ~ 1 + (1 | country) 
   Data: data.to.use 
  AIC  BIC logLik deviance REMLdev
 1006 1015 -499.9     1002   999.9
Random effects:
 Groups   Name        Variance Std.Dev.
 country  (Intercept) 14.609   3.8222  
 Residual             17.839   4.2236  
Number of obs: 171, groups: country, 9

Fixed effects:
            Estimate Std. Error t value
(Intercept)    2.878      1.314    2.19

当我将diff[[5]]替换为diff[[4]]diff[[3]]等时,结果保持不变.尽管如此,我仍想知道这实际上是组合数据集还是一个单一估算数据集的结果.有什么想法吗?谢谢!

The results remain the same when I replace diff[[5]] by diff[[4]], diff[[3]] etc. Still, I am wondering whether this is actually the results for the combined dataset or for one single imputed data set. Any thoughts? Thanks!

推荐答案

我修改了该对象的摘要功能(获取了源代码并打开了./R/summary.R文件).我添加了一些花括号以使代码流畅,并将getcoef更改为coef.这应该适用于这种特殊情况,但是我不确定这是否通用.函数getcoef搜索插槽coef3,但我从未见过.也许@BenBolker在这里可以引起关注?我不能保证结果是这样,但是输出对我来说是合法的.也许您可以与软件包作者联系,以在将来的版本中对此进行更正.

I modified the summary function for this object (fetched the source and opened up ./R/summary.R file). I added some curly braces to make the code flow and changed a getcoef to coef. This should work for this particular case, but I'm not sure if it's general. Function getcoef searches for slot coef3, and I have never seen this. Perhaps @BenBolker can throw an eye here? I can't guarantee this is what the result looks like, but the output looks legit to me. Perhaps you could contact the package authors to correct this in the future version.

摘要(ML.model.0)

summary(ML.model.0)

  Model: ls.mixed
  Number of multiply imputed data sets: 5 

Combined results:

Call:
zelig(formula = polity ~ 1 + tag(1 | country), model = "ls.mixed", 
    data = a.out$imputations)

Coefficients:
        Value Std. Error   t-stat    p-value
[1,] 2.902863   1.311427 2.213515 0.02686218

For combined results from datasets i to j, use summary(x, subset = i:j).
For separate results, use print(summary(x), subset = i:j).

修改后的功能:

summary.MI <- function (object, subset = NULL, ...) {
  if (length(object) == 0) {
    stop('Invalid input for "subset"')
  } else {
    if (length(object) == 1) {
      return(summary(object[[1]]))
    }
  }

  # Roman: This function isn't fecthing coefficients robustly. Something goes wrong. Contact package author. 
  getcoef <- function(obj) {
    # S4
    if (!isS4(obj)) {
      coef(obj)
    } else {
      if ("coef3" %in% slotNames(obj)) {
        obj@coef3
      } else {
        obj@coef
      }
    }
  }

    #
    res <- list()

    # Get indices
    subset <- if (is.null(subset)) {
      1:length(object)
    } else {
      c(subset)
    }

    # Compute the summary of all objects
    for (k in subset) {
      res[[k]] <- summary(object[[k]])
    }


    # Answer
    ans <- list(
      zelig = object[[1]]$name,
      call = object[[1]]$result@call,
      all = res
    )

    #
    coef1 <- se1 <- NULL

    #
    for (k in subset) {
#       tmp <-  getcoef(res[[k]]) # Roman: I changed this to coef, not 100% sure if the output is the same
      tmp <- coef(res[[k]])
      coef1 <- cbind(coef1, tmp[, 1])
      se1 <- cbind(se1, tmp[, 2])
    }

    rows <- nrow(coef1)
    Q <- apply(coef1, 1, mean)
    U <- apply(se1^2, 1, mean)
    B <- apply((coef1-Q)^2, 1, sum)/(length(subset)-1)
    var <- U+(1+1/length(subset))*B
    nu <- (length(subset)-1)*(1+U/((1+1/length(subset))*B))^2

    coef.table <- matrix(NA, nrow = rows, ncol = 4)
    dimnames(coef.table) <- list(rownames(coef1),
                                 c("Value", "Std. Error", "t-stat", "p-value"))
    coef.table[,1] <- Q
    coef.table[,2] <- sqrt(var)
    coef.table[,3] <- Q/sqrt(var)
    coef.table[,4] <- pt(abs(Q/sqrt(var)), df=nu, lower.tail=F)*2
    ans$coefficients <- coef.table
    ans$cov.scaled <- ans$cov.unscaled <- NULL

    for (i in 1:length(ans)) {
      if (is.numeric(ans[[i]]) && !names(ans)[i] %in% c("coefficients")) {
        tmp <- NULL
        for (j in subset) {
          r <- res[[j]]
          tmp <- cbind(tmp, r[[pmatch(names(ans)[i], names(res[[j]]))]])
        }
        ans[[i]] <- apply(tmp, 1, mean)
      }
    }

    class(ans) <- "summaryMI"
    ans
  }

这篇关于R中多重插补数据集的多级回归模型(Amelia,zelig,lme4)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆