仅“显着"的毛毛虫图.混合效应模型中的随机效应 [英] A caterpillar plot of just the "significant" random effects from a mixed effects model

查看:55
本文介绍了仅“显着"的毛毛虫图.混合效应模型中的随机效应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以前在这里寻求帮助的经验非常丰富,希望再次获得帮助.

I've had great experiences asking for help here before and I'm hoping to get some help again.

我正在估计一个相当大的混合效应模型,其中随机效应之一具有超过150个不同的水平.这会使标准的毛毛虫图变得非常难以理解.

I'm estimating a rather large mixed effects model in which one of the random effects has over 150 different levels. That would make a standard caterpillar plot to be quite unreadable.

如果可能的话,我想得到一张随机效应水平的毛毛虫图,在没有更好的术语的情况下,它是显着的".那就是:我想要一个履带图,其中随机截距随系数变化的随机斜率具有不包含零的置信区间"(我知道那不完全是)

I would like, if all possible, to get a caterpillar plot of just the levels of the random effect that are, for a lack of better term, "significant". That is: I want a caterpillar plot in which either the random intercept or the random slope for a varying coefficient has a "confidence interval" (I know that's not quite what it is) that does not include zero.

lme4 标准的 sleepstudy 数据考虑此标准模型.

Consider this standard model from the sleepstudy data that is standard with lme4.

library(lme4)
fit <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
ggCaterpillar(ranef(fit,condVar=TRUE), QQ=FALSE, likeDotplot=TRUE, reorder=FALSE)[["Subject"]] 

我会得到这个毛毛虫的阴谋.

I would get this caterpillar plot.

我使用的毛毛虫图来自

The caterpillar plot I use comes from this code. Do note I tend to use less conservative bounds for the intervals (i.e. 1.645*se and not 1.96*se).

基本上,我想要一个只包含308、309、310、330、331、335、337、349、350、352和370的水平的毛毛虫图,因为这些水平具有截距间隔不包含零的坡度.我之所以要问,是因为我的毛虫分布在150多个不同层次上,这有点难以理解,我认为这可能是一个值得解决的方案.

Basically, I want a caterpillar plot that would just include the levels for 308, 309, 310, 330, 331, 335, 337, 349, 350, 352, and 370 because those levels had either intercepts or slopes whose intervals did not include zero. I ask because my caterpillar plot of over 150 different levels is kind of unreadable and I think this might be a worthwhile solution to it.

可重复的代码如下.我真的很感谢您的帮助.

Reproducible code follows. I genuinely appreciate any help.

# https://stackoverflow.com/questions/34120578/how-can-i-sort-random-effects-by-value-of-the-random-effect-not-the-intercept
ggCaterpillar <- function(re, QQ=TRUE, likeDotplot=TRUE, reorder=TRUE) {
require(ggplot2)
f <- function(x) {
pv   <- attr(x, "postVar")
cols <- 1:(dim(pv)[1])
se   <- unlist(lapply(cols, function(i) sqrt(pv[i, i, ])))
if (reorder) {
  ord  <- unlist(lapply(x, order)) + rep((0:(ncol(x) - 1)) * nrow(x), each=nrow(x))
  pDf  <- data.frame(y=unlist(x)[ord],
                     ci=1.645*se[ord],
                     nQQ=rep(qnorm(ppoints(nrow(x))), ncol(x)),
                     ID=factor(rep(rownames(x), ncol(x))[ord], levels=rownames(x)[ord]),
                     ind=gl(ncol(x), nrow(x), labels=names(x)))
} else {
  pDf  <- data.frame(y=unlist(x),
                     ci=1.645*se,
                     nQQ=rep(qnorm(ppoints(nrow(x))), ncol(x)),
                     ID=factor(rep(rownames(x), ncol(x)), levels=rownames(x)),
                     ind=gl(ncol(x), nrow(x), labels=names(x)))
}

if(QQ) {  ## normal QQ-plot
  p <- ggplot(pDf, aes(nQQ, y))
  p <- p + facet_wrap(~ ind, scales="free")
  p <- p + xlab("Standard normal quantiles") + ylab("Random effect quantiles")
} else {  ## caterpillar dotplot
  p <- ggplot(pDf, aes(ID, y)) + coord_flip()
  if(likeDotplot) {  ## imitate dotplot() -> same scales for random effects
    p <- p + facet_wrap(~ ind)
  } else {           ## different scales for random effects
    p <- p + facet_grid(ind ~ ., scales="free_y")
  }
  p <- p + xlab("Levels of the Random Effect") + ylab("Random Effect")
}

p <- p + theme(legend.position="none")
p <- p + geom_hline(yintercept=0)
p <- p + geom_errorbar(aes(ymin=y-ci, ymax=y+ci), width=0, colour="black")
p <- p + geom_point(aes(size=1.2), colour="blue") 
return(p)
}

  lapply(re, f)
}


library(lme4)
fit <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
ggCaterpillar(ranef(fit,condVar=TRUE), QQ=FALSE, likeDotplot=TRUE, reorder=FALSE)[["Subject"]] 
ggsave(file="sleepstudy.png")

推荐答案

首先,感谢您将有意义"用引号引起来...阅读此书的每个人都应该记住,在这种情况下重要性没有任何 statistical 含义(最好使用Z-statistic(值/std.error)标准,例如改为| Z |> 1.5或| Z |> 1.75,只是强调这不是推断阈值...)

First, thanks for putting "significant" in quotation marks ... everyone reading this should remember that significance doesn't have any statistical meaning in this context (it might be better to use a Z-statistic (value/std.error) criterion such as |Z|>1.5 or |Z|>1.75 instead, just to emphasize that this is not an inferential threshold ...)

我最终被带走了……我决定最好对它进行重构/模块化,所以我写了一个 augment 方法(旨在与 broom 包)从 ranef.mer 对象构造有用的数据帧...完成后,您想要的操作就非常简单.

I ended up getting a little carried away ... I decided that it would be better to refactor/modularize things a little bit, so I wrote an augment method (designed to work with the broom package) that constructs useful data frames from ranef.mer objects ... once this is done, the manipulations you want are pretty easy.

我在答案的末尾放置了 augment.ranef.mer 代码-这有点长(您需要先获取它,然后才能在此处运行代码).更新:一段时间以来,这种 augment 方法已经成为 broom.mixed 包的一部分...

I put the augment.ranef.mer code at the end of my answer -- it's a bit long (you'll need to source it before you can run the code here). update: this augment method has been part of the broom.mixed package for a while now ...

library(broom)
library(reshape2)
library(plyr)

augment 方法应用于RE对象:

Apply the augment method to the RE object:

rr <- ranef(fit,condVar=TRUE)
aa <- augment(rr)

names(aa)
## [1] "grp"       "variable"  "level"     "estimate"  "qq"        "std.error"
## [7] "p"         "lb"        "ub"       

现在, ggplot 代码非常基本.我使用的是 geom_errorbarh(height = 0)而不是 geom_pointrange()+ coord_flip(),因为 ggplot2 无法使用 coord_flip facet_wrap(...,scales ="free")) ...

Now the ggplot code is pretty basic. I'm using geom_errorbarh(height=0) rather than geom_pointrange()+coord_flip() because ggplot2 can't use coord_flip with facet_wrap(...,scales="free") ...

## Q-Q plot:
g0 <- ggplot(aa,aes(estimate,qq,xmin=lb,xmax=ub))+
    geom_errorbarh(height=0)+
    geom_point()+facet_wrap(~variable,scale="free_x")

## regular caterpillar plot:
g1 <- ggplot(aa,aes(estimate,level,xmin=lb,xmax=ub))+
    geom_errorbarh(height=0)+
    geom_vline(xintercept=0,lty=2)+
    geom_point()+facet_wrap(~variable,scale="free_x")

现在找到您要保留的级别:

Now find the levels you want to keep:

aa2 <- ddply(aa,c("grp","level"),
             transform,
             keep=any(p<0.05))
aa3 <- subset(aa2,keep)

仅用显着"水平更新毛虫图.倾斜或截距:

Update caterpillar plot with only levels with "significant" slopes or intercepts:

g1 %+% aa3

如果您只想突出显示重要",则级别,而不是删除非重要"级别完全水平

If you only wanted to highlight "significant" levels rather than removing "non-significant" levels entirely

ggplot(aa2,aes(estimate,level,xmin=lb,xmax=ub,colour=factor(keep)))+
    geom_errorbarh(height=0)+
    geom_vline(xintercept=0,lty=2)+
    geom_point()+facet_wrap(~variable,scale="free_x")+
    scale_colour_manual(values=c("black","red"),guide=FALSE)


##' @importFrom reshape2 melt
##' @importFrom plyr ldply name_rows 
augment.ranef.mer <- function(x,
                                 ci.level=0.9,
                                 reorder=TRUE,
                                 order.var=1) {
    tmpf <- function(z) {
        if (is.character(order.var) && !order.var %in% names(z)) {
            order.var <- 1
            warning("order.var not found, resetting to 1")
        }
        ## would use plyr::name_rows, but want levels first
        zz <- data.frame(level=rownames(z),z,check.names=FALSE)
        if (reorder) {
            ## if numeric order var, add 1 to account for level column
            ov <- if (is.numeric(order.var)) order.var+1 else order.var
            zz$level <- reorder(zz$level, zz[,order.var+1], FUN=identity)
        }
        ## Q-Q values, for each column separately
        qq <- c(apply(z,2,function(y) {
                  qnorm(ppoints(nrow(z)))[order(order(y))]
              }))
        rownames(zz) <- NULL
        pv   <- attr(z, "postVar")
        cols <- 1:(dim(pv)[1])
        se   <- unlist(lapply(cols, function(i) sqrt(pv[i, i, ])))
        ## n.b.: depends on explicit column-major ordering of se/melt
        zzz <- cbind(melt(zz,id.vars="level",value.name="estimate"),
                     qq=qq,std.error=se)
        ## reorder columns:
        subset(zzz,select=c(variable, level, estimate, qq, std.error))
    }
    dd <- ldply(x,tmpf,.id="grp")
    ci.val <- -qnorm((1-ci.level)/2)
    transform(dd,
              p=2*pnorm(-abs(estimate/std.error)), ## 2-tailed p-val
              lb=estimate-ci.val*std.error,
              ub=estimate+ci.val*std.error)
}

这篇关于仅“显着"的毛毛虫图.混合效应模型中的随机效应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆