在混合模型上使用lme4预测功能时遇到问题 [英] Having issues using the lme4 predict function on my mixed models

查看:134
本文介绍了在混合模型上使用lme4预测功能时遇到问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在尝试在混合模型上使用lme4预测功能时遇到了一些困难.进行预测时,我希望能够将我的一些解释变量设置为指定水平,但将其他变量平均.

I’m having a bit of a struggle trying to use the lme4 predict function on my mixed models. When making predications I want to be able to set some of my explanatory variables to a specified level but average across others.

以下是一些组成的数据,它们是我原始数据集的简化且无意义的版本:

Here’s some made up data that is a simplified, nonsense version of my original dataset:

a <-  data.frame(
    TLR4=factor(rep(1:3, each=4, times=4)), 
    repro.state=factor(rep(c("a","j"),each=6,times=8)), 
    month=factor(rep(1:2,each=8,times=6)), 
    sex=factor(rep(1:2, each=4, times=12)), 
    year=factor(rep(1:3, each =32)), 
    mwalkeri=(sample(0:15, 96, replace=TRUE)), 
    AvM=(seq(1:96))
)

AvM号是水田鼠识别号.响应变量(mwalkeri)是每个田鼠上跳蚤数量的计数.我感兴趣的主要解释变量是Tlr4,它是一个具有3种不同基因型(编码为1、2和3)的基因.其他解释变量包括生殖状态(成人或青少年),月份(1或2),性别(1或2)和年份(1、2或3).我的模型如下所示(当然,此模型现在不适用于所组成的数据,但这无关紧要):

The AvM number is the water vole identification number. The response variable (mwalkeri) is a count of the number of fleas on each vole. The main explanatory variable I am interested in is Tlr4 which is a gene with 3 different genotypes (coded 1, 2 and 3). The other explanatory variables included are reproductive state (adult or juvenile), month (1 or 2), sex (1 or 2) and year (1, 2 or 3). My model looks like this (of course this model is now inappropriate for the made up data but that shouldn't matter):

install.packages("lme4")
library(lme4)
mm <- glmer(mwalkeri~TLR4+repro.state+month+sex+year+(1|AvM), data=a, 
    family=poisson,control=glmerControl(optimizer="bobyqa"))`
summary(mm)

我想对每种不同的Tlr4基因型的寄生虫负担做出预测,同时考虑所有其他协变量.为此,我创建了一个新的数据集以指定要设置每个解释变量的级别,并使用了预测函数:

I want to make predictions about parasite burden for each different Tlr4 genotype while accounting for all the other covariates. To do this I created a new dataset to specify the level I wanted to set each of the explanatory variables to and used the predict function:

b <-  data.frame(
    TLR4=factor(1:3), 
    repro.state=factor(c("a","a","a")),
    month=factor(rep(1, times=3)), 
    sex=factor(rep(1, times=3)), 
    year=factor(rep(1, times=3))
)
predict(mm, newdata=b, re.form=NA, type="response")

这确实有效,但是我真的更希望将多年平均,而不是将年份设置为一个特定水平.但是,每当我尝试平均年份时,都会收到以下错误消息:

This did work but I would really prefer to average across years instead of setting year to one particular level. However, whenever I attempt to average year I get this error message:

model.frame.default(delete.response(Terms),newdata,na.action = na.action,中的错误:因子年份具有新水平

Error in model.frame.default(delete.response(Terms), newdata, na.action = na.action, : factor year has new level

我可以跨多年取平均值而不是选择指定的水平吗?另外,我还没有弄清楚如何获得与这些预测相关的标准误差.我能够获得标准误以进行预测的唯一方法是使用lsmeans()函数(来自lsmeans包):

Is it possible for me to average across years instead of selecting a specified level? Also, I've not worked out how to get the standard error associated with these predictions. The only way I've been able to get standard error for predictions was using the lsmeans() function (from the lsmeans package):

c <- lsmeans(mm, "TLR4", type="response")
summary(c, type="response")

这将自动生成标准错误.但是,这是通过对所有其他解释变量求平均值而生成的.我敢肯定有可能对此进行更改,但如果可以的话,我宁愿使用predict()函数.我的目标是创建一个在X轴上具有Tlr4基因型,在y轴上具有预测的寄生虫负担的图表,以证明每种基因型在寄生虫负担方面的预测差异,同时考虑了所有其他重要的协变量.

Which automatically generates the standard error. However, this is generated by averaging across all the other explanatory variables. I'm sure it’s probably possible to change that but I would rather use the predict() function if I can. My goal is to create a graph with Tlr4 genotype on the x-axis and predicted parasite burden on the y-axis to demonstrate the predicted differences in parasite burden for each genotype while all other significant covariants are accounted for.

推荐答案

您可能会对merTools软件包感兴趣,该软件包包括几个用于创建反事实数据集并随后对该新数据进行预测以探索实质性功能的函数.变量对结果的影响.一个很好的例子来自自述文件和包装插图:

You might be interested in the merTools package which includes a couple of functions for creating datasets of counterfactuals and then making predictions on that new data to explore the substantive impact of variables on the outcome. A good example of this comes from the README and the package vignette:

让我们以在类别和连续预测变量之间具有交互项的情况下探索模型的影响为例.首先,我们通过交互拟合模型:

Let's take the case where we want to explore the impact of a model with an interaction term between a category and a continuous predictor. First, we fit a model with interactions:

data(VerbAgg)
fmVA <- glmer(r2 ~ (Anger + Gender + btype + situ)^2 +
       (1|id) + (1|item), family = binomial, 
       data = VerbAgg)

现在,我们使用merTools中的draw函数准备数据.在这里,我们从模型框架中得出平均观察值.然后,我们通过扩展数据框以包含重复的相同观察值,但使用var参数指定的变量的值不同,来wiggle数据.在这里,我们将数据集扩展为btypesituAnger的所有值.

Now we prep the data using the draw function in merTools. Here we draw the average observation from the model frame. We then wiggle the data by expanding the dataframe to include the same observation repeated but with different values of the variable specified by the var parameter. Here, we expand the dataset to all values of btype, situ, and Anger.

# Select the average case
newData <- draw(fmVA, type = "average")
newData <- wiggle(newData, var = "btype", values = unique(VerbAgg$btype))
newData <- wiggle(newData, var = "situ", values = unique(VerbAgg$situ))
newData <- wiggle(newData, var = "Anger", values = unique(VerbAgg$Anger))

head(newData, 10)

#>    r2 Anger Gender btype  situ id        item
#> 1   N    20      F curse other  5 S3WantCurse
#> 2   N    20      F scold other  5 S3WantCurse
#> 3   N    20      F shout other  5 S3WantCurse
#> 4   N    20      F curse  self  5 S3WantCurse
#> 5   N    20      F scold  self  5 S3WantCurse
#> 6   N    20      F shout  self  5 S3WantCurse
#> 7   N    11      F curse other  5 S3WantCurse
#> 8   N    11      F scold other  5 S3WantCurse
#> 9   N    11      F shout other  5 S3WantCurse
#> 10  N    11      F curse  self  5 S3WantCurse

现在,我们只需将此新数据集传递给predictInterval,以生成针对这些反事实的预测.然后,我们针对连续变量Anger以及分别在两个分类变量situbtype上的facet和group绘制预测值.

Now we simply pass this new dataset to predictInterval in order to generate predictions for these counterfactuals. Then we plot the predicted values against the continuous variable, Anger, and facet and group on the two categorical variables situ and btype respectively.

plotdf <- predictInterval(fmVA, newdata = newData, type = "probability", 
        stat = "median", n.sims = 1000)
plotdf <- cbind(plotdf, newData)
ggplot(plotdf, aes(y = fit, x = Anger, color = btype, group = btype)) + 
  geom_point() + geom_smooth(aes(color = btype), method = "lm") + 
  facet_wrap(~situ) + theme_bw() +
  labs(y = "Predicted Probability")

这篇关于在混合模型上使用lme4预测功能时遇到问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆