用分类变量和交互变量绘制GLMM估计线 [英] plotting GLMM estimate line with categorical and interaction variables

查看:136
本文介绍了用分类变量和交互变量绘制GLMM估计线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用GLMM在R中工作,该GLMM包含连续变量和分类变量以及某些交互作用.我已经使用MuMIn中的dredge和model.avg函数来获取每个变量的效果估计.我的问题是如何最好地绘制结果.我想制作一个图,显示一个变量(森林)对我的数据的影响,其中趋势线反映了森林参数估计值,但是我无法弄清楚如何将分类变量和交互变量保持在平均"水平,从而趋势线仅反映森林的影响.

I am working in R with a GLMM with a mixture of continuous and categorical variables with some interactions. I have used the dredge and model.avg functions in MuMIn to obtain effect estimates for each variable. My problem is in how best to plot the results. I want to make a figure showing the effect of one variable (forest) on my data where the trendline reflects the forest parameter estimate, but I can't figure out how to hold the categorical variables and interaction variables at their 'average' so that the trendline only reflects the effect of forest.

这是模型和绘图设置:

#load packages and document
cuckoo<-read.table("http://www.acsu.buffalo.edu/~ciaranwi/home_range.txt", 
header=T,sep="\t")
require(lme4)
require(MuMIn)
as.factor (cuckoo$ID)
as.factor (cuckoo$Sex)
as.factor(cuckoo$MS_bin)
options(na.action = "na.fail")

# create global model and fit
fm<- lmer(log(KD_95)~ MS_bin + Forest + NDVI + Sex + Precip + MS_bin*Forest 
+ MS_bin*NDVI  + MS_bin*Sex + MS_bin*Precip + Argos + Sample + (1|ID), data 
= cuckoo, REML = FALSE)

# dredge but always include argos and sample
KD95<-dredge(fm,fixed=c("Argos","Sample"))

# model averaging 
avgmod<-model.avg(KD95, fit=TRUE)
summary(avgmod)

#plot data
plot(cuckoo$Forest, (log(cuckoo$KD_95)),
 xlab = "Mean percentage of forest cover",
 ylab = expression(paste(plain("Log of Kernel density estimate, 95%    
utilisation, km"^{2}))),
 pch = c(15,17)[as.numeric(cuckoo$MS_bin)],  
 main = "Forest cover",
 col="black", 
 ylim=c(14,23))
legend(80,22, c("Breeding","Nonbreeding"), pch=c(15, 17),  cex=0.7)

然后,我陷入了如何添加趋势线的困境.到目前为止,我有:

Then I get stuck with how to include a trendline. So far I have:

#parameter estimates from model.avg
argos_est<- -1.6
MS_est<- -1.77
samp_est<-0.01
forest_est<--0.02
sex_est<-0.0653
precip_est<-0.0004
ndvi_est<--0.00003
model_intercept<-22.7

#calculate mean values for parameters
argos_mean<-mean(cuckoo$Argos)
samp_mean<-mean(cuckoo$Sample)
forest_mean<-mean(cuckoo$Forest)
ndvi_mean<-mean(cuckoo$NDVI)
precip_mean<-mean(cuckoo$Precip)

#calculate the intercept and add trend line
intercept<-(model_intercept + (forest_est*cuckoo$Forest) +    
(argos_est*argos_mean) + (samp_est * samp_mean) + (ndvi_est*ndvi_mean) +  
(precip_est*precip_mean) )

abline(intercept, forest_est)

但是,这并不考虑交互作用或分类变量,并且截距看起来太高了.有什么想法吗?

But this doesn't consider the interactions or the categorical variables and the intercept looks way too high. Any ideas?

推荐答案

在处理方面,您可以利用R在模型对象中存储有关模型的大量信息并且具有函数从模型对象中获取信息.例如,coef(avgmod)将为您提供模型系数,而predict(avgmod)将为您提供用于拟合模型的数据框中每个观测值的模型预测.

In terms of process, you can make your coding much easier by taking advantage of the fact that R stores lots of information about the model in the model object and has functions to get information out of the model object. For example, coef(avgmod) will give you the model coefficients and predict(avgmod) will give you the model's predictions for each observation in the data frame you used to fit the model.

要可视化我们感兴趣的特定数据值组合的预测,请创建一个新的数据框,其中包含我们要保持不变的变量的含义,以及我们要更改的变量的一系列值(像Forest). expand.grid使用下面列出的值的所有组合创建一个数据框.

To visualize predictions for specific combinations of data values we're interested in, create a new data frame that has the means of the variables we want to hold constant, along with a range of values for variables that we want to vary (like Forest). expand.grid creates a data frame with all combinations of the values listed below.

pred.data = expand.grid(Argos=mean(cuckoo$Argos), Sample=mean(cuckoo$Sample), 
                        Precip=mean(cuckoo$Precip), NDVI=mean(cuckoo$NDVI), 
                        Sex="M", Forest=seq(0,100,10), MS_bin=unique(cuckoo$MS_bin), 
                        ID=unique(cuckoo$ID))

现在,我们使用predict函数将log(KD_95)的预测添加到此数据帧. predict负责为您提供的任何数据计算模型预测(假设您为它提供了一个包含模型中所有变量的数据框).

Now we use the predict function to add predictions for log(KD_95) to this data frame. predict takes care of calculating the model predictions for whatever data you feed it (assuming you give it a data frame that includes all the variables in your model).

pred.data$lgKD_95_pred = predict(avgmod, newdata=pred.data)

现在我们绘制结果. geom_point像在原始图中一样绘制点,然后geom_lineMS_bin的每个级别(和Sex ="M")添加预测.

Now we plot the results. geom_point plots the points, as in your original plot, then geom_line adds the predictions for each level of MS_bin (and Sex="M").

library(ggplot2)

ggplot() +
  geom_point(data=cuckoo, aes(Forest, log(KD_95), shape=factor(MS_bin), 
             colour=factor(MS_bin), size=3)) +
  geom_line(data=pred.data, aes(Forest, lKD_95_pred, colour=factor(MS_bin)))

结果如下:

更新:要绘制男性和女性的回归线,只需在pred.data中包括Sex ="F",然后在图形中添加Sex作为美感.在下面的示例中,当绘制点时,我使用不同的形状来标记Sex,对于回归线,我将使用不同的线类型来标记Sex.

UPDATE: To plot regression lines for both male and female, just include Sex="F" in pred.data and add Sex as an aesthetic in the plot. In the example below, I use different shapes to mark Sex when plotting the points and different line types to mark Sex for the regression lines.

pred.data = expand.grid(Argos=mean(cuckoo$Argos), Sample=mean(cuckoo$Sample), 
                        Precip=mean(cuckoo$Precip), NDVI=mean(cuckoo$NDVI), 
                        Sex=unique(cuckoo$Sex), Forest=seq(0,100,10), MS_bin=unique(cuckoo$MS_bin), 
                        ID=unique(cuckoo$ID))

pred.data$lgKD_95_pred = predict(avgmod, newdata=pred.data)

ggplot() +
  geom_point(data=cuckoo, aes(Forest, log(KD_95), shape=Sex, 
                              colour=factor(MS_bin)), size=3) +
  geom_line(data=pred.data, aes(Forest, lgKD_95_pred, linetype=Sex, 
                                colour=factor(MS_bin))) 

这篇关于用分类变量和交互变量绘制GLMM估计线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆