效果与预测功能 [英] effect vs. predict function

查看:29
本文介绍了效果与预测功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我同时试图理解 R 的 predict() 函数和effects"包 effect() 函数.本质上,我正在运行回归来测试 DV 上两个二分 IV 的相互作用,同时控制两个连续协变量.在我的实际数据集中,交互很重要,所以现在我想绘制交互.因为我的模型中有协变量,所以我应该在控制这些其他变量后绘制均值(即 SPSS 中的估计边际均值).我之前没有在 R 中做过这个,在搜索时我开始期望我应该能够获得我需要的值,用于使用 effect() 或 predict() 函数进行绘图.因此,我尝试在随机生成的数据集上对每个进行操作:

I'm simultaneously trying to understand R's predict() function and the 'effects' package effect() function. Essentially, I'm running a regression to test the interaction of two dichotomous IVs on a DV while controlling for two continuous covariates. In my actual dataset, the interaction is significant and so now I would like to plot the interaction. Because I have covariates in my model, I should plot the means after controlling for these other variables (i.e. estimated marginal means in SPSS). I haven't done this in R before and while searching I've come to expect I should be able to obtain the values I need for graphing with either the effect() or the predict() functions. Therefore, I tried doing it with each on a randomly generated dataset:

> set.seed(100)
> test <- data.frame(iv1 = factor(round(rnorm(200, mean=.5, sd=.25), 0), levels=c(0,1), labels=c("A","B")), iv2 = factor(round(rnorm(200, mean=.5, sd=.25), 0), levels=c(0,1), labels=c("C","D")), cv1 = rnorm(200, mean=4, sd=1), cv2 = rnorm(200, mean=3, sd=1), dv = rnorm(200, mean=5, sd=1))
> mod <- lm(dv ~ cv1 + cv2 + iv1*iv2, data = test)
> new <- with(test, expand.grid(iv1 = levels(iv1), iv2 = levels(iv2), cv1 = mean(cv1), cv2 = mean(cv2)))
> test$pv <- predict(mod, newdata = new)

> tapply(test$pv, list(test$iv1, test$iv2), mean)
         C        D
A 5.076842 5.086218
B 5.025614 5.065399

> effect("iv1:iv2", mod)

 iv1*iv2 effect
   iv2
iv1        C        D
  A 5.019391 5.167275
  B 5.216955 4.855195

因为我得到了不同的结果,所以我将数据导出到 SPSS 并运行 ANOVA 做同样的事情并查看估计的边际均值 (EMMEANS).这些与 R 中的 effect() 给出的结果相同.

Because I'm getting different results I exported the data to SPSS and ran an ANOVA doing the same thing and looked at the estimated marginal means (EMMEANS). These were identical to the results given by effect() in R.

SPSS 语法:

DATASET ACTIVATE DataSet1.
RECODE iv1 iv2 ('A'=-1) ('B'=1) ('C'=-1) ('D'=1) INTO iv1_recode iv2_recode.
EXECUTE.

UNIANOVA dv BY iv1_recode iv2_recode WITH cv1 cv2
  /METHOD=SSTYPE(3)
  /INTERCEPT=INCLUDE
  /EMMEANS=TABLES(OVERALL) WITH(cv1=MEAN cv2=MEAN) 
  /EMMEANS=TABLES(iv1_recode) WITH(cv1=MEAN cv2=MEAN) 
  /EMMEANS=TABLES(iv2_recode) WITH(cv1=MEAN cv2=MEAN) 
  /EMMEANS=TABLES(iv1_recode*iv2_recode) WITH(cv1=MEAN cv2=MEAN) 
  /PRINT=DESCRIPTIVE
  /CRITERIA=ALPHA(.05)
  /DESIGN=cv1 cv2 iv1_recode iv2_recode iv1_recode*iv2_recode.

作为检查,EMMEANS 的 SPSS 输出显示,模型中出现的协变量按以下值进行评估:cv1 = 3.996208827095569,cv2 = 3.052881951477868."这些与我在 predict 中使用的协变量的值相同:

As a check, the SPSS output for the EMMEANS says, "Covariates appearing in the model are evaluated at the following values: cv1 = 3.996208827095569, cv2 = 3.052881951477868." These are identical to the values for the covariates that I used with predict:

> new
  iv1 iv2      cv1      cv2
1   A   C 3.996209 3.052882
2   B   C 3.996209 3.052882
3   A   D 3.996209 3.052882
4   B   D 3.996209 3.052882

那我有什么不明白的?还是我在这里做了一些愚蠢的事情(一种明显的可能性)?这可能是我没有理解估计的边际平均值是什么.

So what am I failing to understand? Or am I doing something stupid here (a distinct possibility)? This could be me not grasping what an estimated marginal mean is.

非常感谢任何帮助!

推荐答案

因此,关于获取模型本身的结果以及将模型应用于观察到的数据,这里似乎有些混乱.这里出现了一个大问题

So there seems to be a bit of confusion here about getting results for the model itself, and the model applied to the observed data. A big problem occurs here

test$pv <- predict(mod, newdata = new)

这里,new 有 4 行,所以 predict(mod, newdata = new) 有值.跑步就是这样

Here, new had 4 rows so predict(mod, newdata = new) has for values. Running just that gives

predict(mod, newdata = new)
#        1        2        3        4 
# 5.019391 5.216955 5.167275 4.855195 

并注意这些值如何与 effect() 的结果匹配.

and notice how these values match the result from effect().

当您将它们分配给 test$pv 时,长度为 4 的向量会被回收,因此它最终会沿着 test 数据帧重复 50 次.并且 test 确实包含您的观察数据,因此将模型中的理论预测与观察数据混合起来并不是一个真正的好主意.如果您真的想要每个观察的真实"预测值,那么 test$pv<-predict(mod) 将是正确的选择.然而,对 test 求和,这又是观察值`,用

When you assign them to test$pv, that length 4 vector gets recycled so it ends up repeating 50 times along the test data.frame. And test really contains your observed data, so mixing theoretical predictions form the model and the observed data isn't really a super idea. If you actually wanted the "true" predicted value for each observation, then test$pv<-predict(mod) would have been the right choice. However, taking the sums over test, which is again the observed values`, with

tapply(test$pv, list(test$iv1, test$iv2), mean)

将使用实际观察到的 cv1cv2 的值,而不仅仅是协变量的整体平均值.

would be using the values of cv1 and cv2 that were actually observed, rather than just the overall mean of your covariates.

我们已经看到 effect() 使用协变量的均值,但您也可以使用

We've already seen that effect() use the mean of the covariates, but you could also explicitly set values with

effect("iv1:iv2", mod, given.values=c(cv1=3.996209, cv2=3.052882))

如果你喜欢.

这篇关于效果与预测功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆