为什么 effect() 和 predict() 会产生不同的模型预测? [英] Why does effect() and predict() produce different model predictions?

查看:67
本文介绍了为什么 effect() 和 predict() 会产生不同的模型预测?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这篇文章的数据在这里和R脚本和数据可用这里(R 脚本也在下面的帖子中).在此先感谢您的帮助.

Data for this post is available here and R script and data available here (R script is also in post below). Thanks in advance for any help.

我在 glmmTMB 中构建了一系列混合模型.我最好的两个模型如下.

I have built a series of mixed models in glmmTMB. My best two models are below.

igm_20 <- glmmTMB(igm_pres ~  fRHDV2_arrive_cat + fseason + sage + save_ajust_abun + fseason*fRHDV2_arrive_cat + (1 | fsite), data = edit_pp_dat, family = binomial)

igm_21 <- glmmTMB(igm_pres ~  fRHDV2_arrive_cat + fseason + sage + save_ajust_abun + fseason*fRHDV2_arrive_cat + sage*fRHDV2_arrive_cat + (1 | fsite), data = edit_pp_dat, family = binomial)

我对交互作用特别感兴趣 fseason*fRHDV2_arrive_cat,因此在构建我的模型后,我创建了 effect() 图,显示了这种交互作用对我的结果变量的影响在这两种模型中.

I am particularly interested in the interaction fseason*fRHDV2_arrive_cat, and so after building my models I created effect() plots showing the influence of this interaction on my outcome variable in both models.

ef_1 <- effect(term = "fRHDV2_arrive_cat*fseason", mod = igm_20)
windows();plot(ef_1, xlab = "Season", ylab = "Predicted probability of IgM antibody presence", main = "", factor.names = FALSE)

ef_2 <- effect(term = "fRHDV2_arrive_cat*fseason", mod = igm_21)
windows();plot(ef_2, xlab = "Season", ylab = "Predicted probability of IgM antibody presence", main = "", factor.names = FALSE)

效果图 1效果图2(抱歉提供情节链接,我没有足够的声誉来发布实际情节)

Effect plot 1 Effect plot 2 (Sorry for providing links to plots, i don't have enough reputation to post actual plots)

从效果图中可以看出,两种模型中交互作用fseason*fRHDV2_arrive_cat的影响非常相似,这并不奇怪.然后我将这两个模型平均如下:

As seen in the effect plots, the influence of the interaction fseason*fRHDV2_arrive_cat is very similar in both models, this is not surprising. I then averaged these two models as follows:

mod_ave_list_1 <- list(igm_20, igm_21)
mod_ave_1 <- model.avg(mod_ave_list_1, rank = AICc)
summary(mod_ave_1)

根据模型的平均结果,我尝试创建与上述类似的 effect() 图.但是,由于 effect() 函数不适用于平均模型和 predict() 中的 re.form = NA 容量来产生人口glmmTMB 模型没有实现平均模型预测,我首先必须在另一个包中重新创建和重新平均我的两个模型,如下所示:

From the model averaged results I tried to create a similar effect() plot to those above. However, as the effect() function does not work with averaged models and the re.form = NA capacity in predict() to produce population averaged model predictions is not implemented for glmmTMB models, I first had to re-create and re-average my two model in another package as follows:

predict_1 <- glmer(igm_pres ~  fRHDV2_arrive_cat + fseason + sage + save_ajust_abun + fseason*fRHDV2_arrive_cat + (1 | fsite), data = edit_pp_dat, family = binomial)

predict_2 <- glmer(igm_pres ~  fRHDV2_arrive_cat + fseason + sage + save_ajust_abun + fseason*fRHDV2_arrive_cat + sage*fRHDV2_arrive_cat + (1 | fsite), data = edit_pp_dat, family = binomial)

predict_list_1 <- list(predict_1, predict_2)
ave_predict <- model.avg(predict_list_1, rank = AICc)

然后我创建了一个 newdata 框架,我从中做出并绘制了模型预测,作为产生与上述类似的 effect() 图的一种手段.我在进行模型预测时使用了数值预测变量的平均值,因为这是 另一篇文章建议在调用 effect() 时发生.我在 predict() 函数中包含了 re.form = NA 以便我得到人口平均预测,因为我的模型包含随机效应.

I then created a newdata frame from which I made and plotted model predictions as a means of producing a similar effect() plot to that above. I used the mean value for numeric predictors when making model predictions as this is what another post suggests happens when making a call to effect(). I included re.form = NA in the predict() function so that I got population averaged predictions as my models include random effects.

a <- as.data.frame(c("Summer", "Autumn", "Winter", "Spring", "Summer", "Autumn", "Winter", "Spring"))
a$fRHDV2_arrive_cat <- c("Pre-RHDV2 arrival", "Pre-RHDV2 arrival", "Pre-RHDV2 arrival", "Pre-RHDV2 arrival", "Post-RHDV2 arrival", "Post-RHDV2 arrival", "Post-RHDV2 arrival", "Post-RHDV2 arrival")
mean(edit_pp_dat$sage, na.rm = TRUE) #4.659477e-17
mean(edit_pp_dat$save_ajust_abun, na.rm = TRUE) #-3.004684e-17
a$sage <- c(4.659477e-17, 4.659477e-17, 4.659477e-17, 4.659477e-17, 4.659477e-17, 4.659477e-17, 4.659477e-17, 4.659477e-17)
a$save_ajust_abun <- c(-3.004684e-17, -3.004684e-17, -3.004684e-17, -3.004684e-17, -3.004684e-17, -3.004684e-17, -3.004684e-17, -3.004684e-17)
a$fsite <- c(NA, NA, NA, NA, NA, NA, NA, NA)
colnames(a) <- c("fseason", "fRHDV2_arrive_cat", "sage", "save_ajust_abun", "fsite")

predict.values <- predict(ave_predict, backtransform = TRUE, newdata = a, se.fit = TRUE, re.form = NA)

a$estimates <- predict.values$fit
a$se <- predict.values$se.fit
a$lci <- a$estimates - 1.96*a$se
a$uci <- a$estimates + 1.96*a$se
a$fseason <- factor(a$fseason, levels = c("Summer", "Autumn", "Winter", "Spring"))
a$fRHDV2_arrive_cat <- factor(a$fRHDV2_arrive_cat, levels = c("Pre-RHDV2 arrival", "Post-RHDV2 arrival"))

ggplot(a, aes(x = fseason, y = estimates, colour = fRHDV2_arrive_cat, group = fRHDV2_arrive_cat)) + geom_line(size = 1) + geom_point(size = 3) + geom_errorbar(aes(ymin = lci, ymax = uci), width = .2) + labs(x = "Season", y = "Predicted probability of IgM seropositivity", colour = "RHDV2 arrival category") + scale_color_manual(labels = c("Pre-arrival", "Post-arrival"), values = c("red", "blue")) + theme(axis.title.x = element_text(face = "bold", size = 16), axis.title.y = element_text(face = "bold", size = 16), axis.text.x = element_text(size = 14), axis.text.y = element_text(size = 14), legend.title = element_text(face = "bold", size = 14), legend.text = element_text(size = 12))

模型平均预测图

为什么最后一个图与上面生成的两个 effect() 图如此不同?我期待它们非常相似.例如,在两个 effect() 图中,在 RHDV2 到达后的夏季和冬季,igm 抗体存在的预测概率要低得多,但是在由 predict() 生成的最后一个图中,使用平均模型,在夏季RHDV2到达后预测igm抗体存在的概率较高,而在冬季RHDV2到达前和到达后的预测概率相似.

Why is this last plot so different to the two effect() plots produced above? I was expecting them to be very similar. For example, in the two effect() plots the predicted probability of igm antibody presence is much lower in summer and winter post-arrival of RHDV2, however in the last plot produced from predict(), using the averaged model, the predicted probability of igm antibody presence is higher in summer post-arrival of RHDV2 and similar in winter for both pre-arrival and post-arrival of RHDV2.

我注意到这里有一个类似的帖子,但这对我没有帮助解决我的问题.

I note that there is a similar post here, but that this has not helped me to solve my problem.

推荐答案

对于那些可能感兴趣的人,我想出了如何解决我的问题.edit_pp_dat$sageedit_pp_dat$save_ajust_abun 是标准化变量,因此它们的平均值为 0.因此,a$sagea$save_ajust_abun 应该如下所示:

For those who may be interested, I worked out how to fix my problem. edit_pp_dat$sage and edit_pp_dat$save_ajust_abun are standardised variables, accordingly their mean is 0. Therefore, a$sage and a$save_ajust_abun should have been as follows:

a$sage <- c(0, 0, 0, 0, 0, 0, 0, 0)
a$save_ajust_abun <- c(0, 0, 0, 0, 0, 0, 0, 0)

我的电脑也有问题,因为 edit_pp_dat$sageedit_pp_dat$save_ajust_abun 是一个矩阵,看起来 predict() 操作根据提供给模型的数据是在矩阵还是数据框中而有所不同.

I additionally had difficulties on my computer as edit_pp_dat$sage and edit_pp_dat$save_ajust_abun was a matrix, it appears that predict() operates differently according to if the data supplied to the models was in a matrix or dataframe.

我不确定为什么 mean(edit_pp_dat$sage, na.rm = TRUE)mean(edit_pp_dat$save_ajust_abun, na.rm = TRUE) 不给 0.

I am unsure why mean(edit_pp_dat$sage, na.rm = TRUE) and mean(edit_pp_dat$save_ajust_abun, na.rm = TRUE) do not give 0.

这篇关于为什么 effect() 和 predict() 会产生不同的模型预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆