为什么 effect() 和 predict() 会产生不同的模型预测? [英] Why does effect() and predict() produce different model predictions?
问题描述
这篇文章的数据在这里和R脚本和数据可用这里(R 脚本也在下面的帖子中).在此先感谢您的帮助.
Data for this post is available here and R script and data available here (R script is also in post below). Thanks in advance for any help.
我在 glmmTMB
中构建了一系列混合模型.我最好的两个模型如下.
I have built a series of mixed models in glmmTMB
. My best two models are below.
igm_20 <- glmmTMB(igm_pres ~ fRHDV2_arrive_cat + fseason + sage + save_ajust_abun + fseason*fRHDV2_arrive_cat + (1 | fsite), data = edit_pp_dat, family = binomial)
igm_21 <- glmmTMB(igm_pres ~ fRHDV2_arrive_cat + fseason + sage + save_ajust_abun + fseason*fRHDV2_arrive_cat + sage*fRHDV2_arrive_cat + (1 | fsite), data = edit_pp_dat, family = binomial)
我对交互作用特别感兴趣 fseason*fRHDV2_arrive_cat
,因此在构建我的模型后,我创建了 effect()
图,显示了这种交互作用对我的结果变量的影响在这两种模型中.
I am particularly interested in the interaction fseason*fRHDV2_arrive_cat
, and so after building my models I created effect()
plots showing the influence of this interaction on my outcome variable in both models.
ef_1 <- effect(term = "fRHDV2_arrive_cat*fseason", mod = igm_20)
windows();plot(ef_1, xlab = "Season", ylab = "Predicted probability of IgM antibody presence", main = "", factor.names = FALSE)
ef_2 <- effect(term = "fRHDV2_arrive_cat*fseason", mod = igm_21)
windows();plot(ef_2, xlab = "Season", ylab = "Predicted probability of IgM antibody presence", main = "", factor.names = FALSE)
效果图 1效果图2(抱歉提供情节链接,我没有足够的声誉来发布实际情节)
Effect plot 1 Effect plot 2 (Sorry for providing links to plots, i don't have enough reputation to post actual plots)
从效果图中可以看出,两种模型中交互作用fseason*fRHDV2_arrive_cat
的影响非常相似,这并不奇怪.然后我将这两个模型平均如下:
As seen in the effect plots, the influence of the interaction fseason*fRHDV2_arrive_cat
is very similar in both models, this is not surprising. I then averaged these two models as follows:
mod_ave_list_1 <- list(igm_20, igm_21)
mod_ave_1 <- model.avg(mod_ave_list_1, rank = AICc)
summary(mod_ave_1)
根据模型的平均结果,我尝试创建与上述类似的 effect()
图.但是,由于 effect()
函数不适用于平均模型和 predict()
中的 re.form = NA
容量来产生人口glmmTMB
模型没有实现平均模型预测,我首先必须在另一个包中重新创建和重新平均我的两个模型,如下所示:
From the model averaged results I tried to create a similar effect()
plot to those above. However, as the effect()
function does not work with averaged models and the re.form = NA
capacity in predict()
to produce population averaged model predictions is not implemented for glmmTMB
models, I first had to re-create and re-average my two model in another package as follows:
predict_1 <- glmer(igm_pres ~ fRHDV2_arrive_cat + fseason + sage + save_ajust_abun + fseason*fRHDV2_arrive_cat + (1 | fsite), data = edit_pp_dat, family = binomial)
predict_2 <- glmer(igm_pres ~ fRHDV2_arrive_cat + fseason + sage + save_ajust_abun + fseason*fRHDV2_arrive_cat + sage*fRHDV2_arrive_cat + (1 | fsite), data = edit_pp_dat, family = binomial)
predict_list_1 <- list(predict_1, predict_2)
ave_predict <- model.avg(predict_list_1, rank = AICc)
然后我创建了一个 newdata
框架,我从中做出并绘制了模型预测,作为产生与上述类似的 effect()
图的一种手段.我在进行模型预测时使用了数值预测变量的平均值,因为这是 另一篇文章建议在调用 effect()
时发生.我在 predict()
函数中包含了 re.form = NA
以便我得到人口平均预测,因为我的模型包含随机效应.
I then created a newdata
frame from which I made and plotted model predictions as a means of producing a similar effect()
plot to that above. I used the mean value for numeric predictors when making model predictions as this is what another post suggests happens when making a call to effect()
. I included re.form = NA
in the predict()
function so that I got population averaged predictions as my models include random effects.
a <- as.data.frame(c("Summer", "Autumn", "Winter", "Spring", "Summer", "Autumn", "Winter", "Spring"))
a$fRHDV2_arrive_cat <- c("Pre-RHDV2 arrival", "Pre-RHDV2 arrival", "Pre-RHDV2 arrival", "Pre-RHDV2 arrival", "Post-RHDV2 arrival", "Post-RHDV2 arrival", "Post-RHDV2 arrival", "Post-RHDV2 arrival")
mean(edit_pp_dat$sage, na.rm = TRUE) #4.659477e-17
mean(edit_pp_dat$save_ajust_abun, na.rm = TRUE) #-3.004684e-17
a$sage <- c(4.659477e-17, 4.659477e-17, 4.659477e-17, 4.659477e-17, 4.659477e-17, 4.659477e-17, 4.659477e-17, 4.659477e-17)
a$save_ajust_abun <- c(-3.004684e-17, -3.004684e-17, -3.004684e-17, -3.004684e-17, -3.004684e-17, -3.004684e-17, -3.004684e-17, -3.004684e-17)
a$fsite <- c(NA, NA, NA, NA, NA, NA, NA, NA)
colnames(a) <- c("fseason", "fRHDV2_arrive_cat", "sage", "save_ajust_abun", "fsite")
predict.values <- predict(ave_predict, backtransform = TRUE, newdata = a, se.fit = TRUE, re.form = NA)
a$estimates <- predict.values$fit
a$se <- predict.values$se.fit
a$lci <- a$estimates - 1.96*a$se
a$uci <- a$estimates + 1.96*a$se
a$fseason <- factor(a$fseason, levels = c("Summer", "Autumn", "Winter", "Spring"))
a$fRHDV2_arrive_cat <- factor(a$fRHDV2_arrive_cat, levels = c("Pre-RHDV2 arrival", "Post-RHDV2 arrival"))
ggplot(a, aes(x = fseason, y = estimates, colour = fRHDV2_arrive_cat, group = fRHDV2_arrive_cat)) + geom_line(size = 1) + geom_point(size = 3) + geom_errorbar(aes(ymin = lci, ymax = uci), width = .2) + labs(x = "Season", y = "Predicted probability of IgM seropositivity", colour = "RHDV2 arrival category") + scale_color_manual(labels = c("Pre-arrival", "Post-arrival"), values = c("red", "blue")) + theme(axis.title.x = element_text(face = "bold", size = 16), axis.title.y = element_text(face = "bold", size = 16), axis.text.x = element_text(size = 14), axis.text.y = element_text(size = 14), legend.title = element_text(face = "bold", size = 14), legend.text = element_text(size = 12))
为什么最后一个图与上面生成的两个 effect()
图如此不同?我期待它们非常相似.例如,在两个 effect()
图中,在 RHDV2 到达后的夏季和冬季,igm 抗体存在的预测概率要低得多,但是在由 predict() 生成的最后一个图中
,使用平均模型,在夏季RHDV2到达后预测igm抗体存在的概率较高,而在冬季RHDV2到达前和到达后的预测概率相似.
Why is this last plot so different to the two effect()
plots produced above? I was expecting them to be very similar. For example, in the two effect()
plots the predicted probability of igm antibody presence is much lower in summer and winter post-arrival of RHDV2, however in the last plot produced from predict()
, using the averaged model, the predicted probability of igm antibody presence is higher in summer post-arrival of RHDV2 and similar in winter for both pre-arrival and post-arrival of RHDV2.
我注意到这里有一个类似的帖子,但这对我没有帮助解决我的问题.
I note that there is a similar post here, but that this has not helped me to solve my problem.
推荐答案
对于那些可能感兴趣的人,我想出了如何解决我的问题.edit_pp_dat$sage
和 edit_pp_dat$save_ajust_abun
是标准化变量,因此它们的平均值为 0.因此,a$sage
和 a$save_ajust_abun
应该如下所示:
For those who may be interested, I worked out how to fix my problem. edit_pp_dat$sage
and edit_pp_dat$save_ajust_abun
are standardised variables, accordingly their mean is 0. Therefore, a$sage
and a$save_ajust_abun
should have been as follows:
a$sage <- c(0, 0, 0, 0, 0, 0, 0, 0)
a$save_ajust_abun <- c(0, 0, 0, 0, 0, 0, 0, 0)
我的电脑也有问题,因为 edit_pp_dat$sage
和 edit_pp_dat$save_ajust_abun
是一个矩阵,看起来 predict()
操作根据提供给模型的数据是在矩阵还是数据框中而有所不同.
I additionally had difficulties on my computer as edit_pp_dat$sage
and edit_pp_dat$save_ajust_abun
was a matrix, it appears that predict()
operates differently according to if the data supplied to the models was in a matrix or dataframe.
我不确定为什么 mean(edit_pp_dat$sage, na.rm = TRUE)
和 mean(edit_pp_dat$save_ajust_abun, na.rm = TRUE)
不给 0.
I am unsure why mean(edit_pp_dat$sage, na.rm = TRUE)
and mean(edit_pp_dat$save_ajust_abun, na.rm = TRUE)
do not give 0.
这篇关于为什么 effect() 和 predict() 会产生不同的模型预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!