通过分类变量和连续变量的交互可视化GLMM预测 [英] Visualising GLMM predictions with interaction of categorical and continuous variables

查看:209
本文介绍了通过分类变量和连续变量的交互可视化GLMM预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用GLMM在R中工作,该GLMM包含连续变量和分类变量以及某些交互作用.我已经使用MuMIn中的dredge和model.avg函数来获取每个变量的效果估计.我的问题是如何最好地绘制结果.我想制作一个图,显示一个变量(森林)对我的数据的影响,其中趋势线反映了森林参数估计值,但是我无法弄清楚如何将分类变量和交互变量保持在平均"水平,从而趋势线仅反映森林的影响.

I am working in R with a GLMM with a mixture of continuous and categorical variables with some interactions. I have used the dredge and model.avg functions in MuMIn to obtain effect estimates for each variable. My problem is in how best to plot the results. I want to make a figure showing the effect of one variable (forest) on my data where the trendline reflects the forest parameter estimate, but I can't figure out how to hold the categorical variables and interaction variables at their 'average' so that the trendline only reflects the effect of forest.

这是模型和绘图设置:

#load packages and document
cuckoo<-read.table("http://www.acsu.buffalo.edu/~ciaranwi/home_range.txt", 
header=T,sep="\t")
require(lme4)
require(MuMIn)
as.factor (cuckoo$ID)
as.factor (cuckoo$Sex)
as.factor(cuckoo$MS_bin)
options(na.action = "na.fail")

# create global model and fit
fm<- lmer(log(KD_95)~ MS_bin + Forest + NDVI + Sex + Precip + MS_bin*Forest 
+ MS_bin*NDVI  + MS_bin*Sex + MS_bin*Precip + Argos + Sample + (1|ID), data 
= cuckoo, REML = FALSE)

# dredge but always include argos and sample
KD95<-dredge(fm,fixed=c("Argos","Sample"))

# model averaging 
avgmod<-model.avg(KD95, fit=TRUE)
summary(avgmod)

#plot data
plot(cuckoo$Forest, (log(cuckoo$KD_95)),
 xlab = "Mean percentage of forest cover",
 ylab = expression(paste(plain("Log of Kernel density estimate, 95%    
utilisation, km"^{2}))),
 pch = c(15,17)[as.numeric(cuckoo$MS_bin)],  
 main = "Forest cover",
 col="black", 
 ylim=c(14,23))
legend(80,22, c("Breeding","Nonbreeding"), pch=c(15, 17),  cex=0.7)

然后,我陷入了如何添加趋势线的困境.到目前为止,我有:

Then I get stuck with how to include a trendline. So far I have:

#parameter estimates from model.avg
argos_est<- -1.6
MS_est<- -1.77
samp_est<-0.01
forest_est<--0.02
sex_est<-0.0653
precip_est<-0.0004
ndvi_est<--0.00003
model_intercept<-22.7

#calculate mean values for parameters
argos_mean<-mean(cuckoo$Argos)
samp_mean<-mean(cuckoo$Sample)
forest_mean<-mean(cuckoo$Forest)
ndvi_mean<-mean(cuckoo$NDVI)
precip_mean<-mean(cuckoo$Precip)

#calculate the intercept and add trend line
intercept<-(model_intercept + (forest_est*cuckoo$Forest) +    
(argos_est*argos_mean) + (samp_est * samp_mean) + (ndvi_est*ndvi_mean) +  
(precip_est*precip_mean) )

abline(intercept, forest_est)

但是,这并不考虑交互作用或分类变量,并且截距看起来太高了.有什么想法吗?

But this doesn't consider the interactions or the categorical variables and the intercept looks way too high. Any ideas?

推荐答案

在处理方面,您可以利用R在模型对象中存储有关模型的大量信息并且具有函数从模型对象中获取信息.例如, coef(avgmod)将为您提供模型系数,而 predict(avgmod)将为您提供用于拟合数据框的每个观测值的模型预测.模型.

In terms of process, you can make your coding much easier by taking advantage of the fact that R stores lots of information about the model in the model object and has functions to get information out of the model object. For example, coef(avgmod) will give you the model coefficients and predict(avgmod) will give you the model's predictions for each observation in the data frame you used to fit the model.

要可视化我们感兴趣的特定数据值组合的预测,请创建一个新的数据框,其中包含我们要保持不变的变量的含义,以及我们要更改的变量的一系列值(例如 Forest ). expand.grid 使用下面列出的值的所有组合创建一个数据框.

To visualize predictions for specific combinations of data values we're interested in, create a new data frame that has the means of the variables we want to hold constant, along with a range of values for variables that we want to vary (like Forest). expand.grid creates a data frame with all combinations of the values listed below.

pred.data = expand.grid(Argos=mean(cuckoo$Argos), Sample=mean(cuckoo$Sample), 
                        Precip=mean(cuckoo$Precip), NDVI=mean(cuckoo$NDVI), 
                        Sex="M", Forest=seq(0,100,10), MS_bin=unique(cuckoo$MS_bin), 
                        ID=unique(cuckoo$ID))

现在,我们使用 predict 函数将log(KD_95)的预测添加到此数据帧. predict 负责为您提供的任何数据计算模型预测(假设您为它提供了一个包含模型中所有变量的数据框).

Now we use the predict function to add predictions for log(KD_95) to this data frame. predict takes care of calculating the model predictions for whatever data you feed it (assuming you give it a data frame that includes all the variables in your model).

pred.data$lgKD_95_pred = predict(avgmod, newdata=pred.data)

现在我们绘制结果. geom_point 绘制点,就像在原始图中一样,然后 geom_line 添加每个 MS_bin 级别的预测(和Sex ="M").

Now we plot the results. geom_point plots the points, as in your original plot, then geom_line adds the predictions for each level of MS_bin (and Sex="M").

library(ggplot2)

ggplot() +
  geom_point(data=cuckoo, aes(Forest, log(KD_95), shape=factor(MS_bin), 
             colour=factor(MS_bin), size=3)) +
  geom_line(data=pred.data, aes(Forest, lKD_95_pred, colour=factor(MS_bin)))

结果如下:

更新:要绘制男性和女性的回归线,只需在 pred.data 中包含Sex ="F"并添加 Sex 作为情节中的美学.在下面的示例中,在绘制点时,我使用不同的形状来标记 Sex ,而对于回归线,将使用不同的线型来标记 Sex .

UPDATE: To plot regression lines for both male and female, just include Sex="F" in pred.data and add Sex as an aesthetic in the plot. In the example below, I use different shapes to mark Sex when plotting the points and different line types to mark Sex for the regression lines.

pred.data = expand.grid(Argos=mean(cuckoo$Argos), Sample=mean(cuckoo$Sample), 
                        Precip=mean(cuckoo$Precip), NDVI=mean(cuckoo$NDVI), 
                        Sex=unique(cuckoo$Sex), Forest=seq(0,100,10), MS_bin=unique(cuckoo$MS_bin), 
                        ID=unique(cuckoo$ID))

pred.data$lgKD_95_pred = predict(avgmod, newdata=pred.data)

ggplot() +
  geom_point(data=cuckoo, aes(Forest, log(KD_95), shape=Sex, 
                              colour=factor(MS_bin)), size=3) +
  geom_line(data=pred.data, aes(Forest, lgKD_95_pred, linetype=Sex, 
                                colour=factor(MS_bin))) 

这篇关于通过分类变量和连续变量的交互可视化GLMM预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆