计算和比较每个组的回归交互的系数估计值 [英] Calculate and compare coefficient estimates from a regression interaction for each group

查看:86
本文介绍了计算和比较每个组的回归交互的系数估计值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

A) 我对连续变量 (Var1) 对以四个不同组为条件的连续因变量 (DV) 的影响感兴趣,这些组定义为两个二元变量(Dummy1Dummy2).因此,我进行了三向交互.

Var1 <- sample(0:10, 100, replace = T)Dummy1 <-样本(c(0,1),100,替换= T)Dummy2 <-样本(c(0,1),100,替换= T)DV <-2*Var1 + Var1*Dummy1 + 2*Var1*Dummy2 + 10*Var1*Dummy1*Dummy2 + norm(100)适合 <- lm(DV ~ Var1*Dummy1*Dummy2)

我想比较组之间Var1 的系数.我相信,这可以通过将相关系数相加来实现.

# Group Dummy1 = 0 &虚拟 2 = 0:拟合$系数[Var1]# 组 Dummy1 = 1 &虚拟 2 = 0:fit$coefficients[Var1] + fit$coefficients[Var1:Dummy1]

然而,这似乎过于艰巨且容易出错.什么是更有效的解决方案?

我想要的输出是 Var1Dummy1Dummy2 的每种可能组合的估计效果.

B) 一旦我知道了每组 Var1 的估计效果大小,我如何测试任何两个在统计上是否存在差异?我认为 linearHypothesis() 函数可以提供帮助,但我不知道如何.谢谢!

解决方案

一个完全交互的模型相当于对每个数据子集运行回归,所以如果你的意图确实是:

<块引用>

我想要的输出是每个可能的 Var1 的估计效果Dummy1 和 Dummy2 的组合.

那么以下内容可能会有所帮助:

# 获取你的数据set.seed(42)Var1 <- 样本(0:10, 100, 替换 = T)Dummy1 <-样本(c(0,1),100,替换= T)Dummy2 <-样本(c(0,1),100,替换= T)DV <-2*Var1 + Var1*Dummy1 + 2*Var1*Dummy2 + 10*Var1*Dummy1*Dummy2 + norm(100)df <- data.frame(DV, Var1, Dummy1, Dummy2)

首先要注意

fit <- lm(DV ~ Var1*Dummy1*Dummy2)拟合$系数[Var1"]变量 12.049678fit$coefficients["Var1"] + fit$coefficients["Var1:Dummy1"]变量 12.993598

现在,让我们估计每个组组合的效果:

库(dplyr)图书馆(扫帚)df %>% group_by(Dummy1, Dummy2) %>% do(tidy(lm(DV ~ Var1, data=.)))来源:本地数据框 [8 x 7]组:Dummy1、Dummy2 [4]Dummy1 Dummy2 项估计 std.error 统计 p.value(dbl) (dbl) (chr) (dbl) (dbl) (dbl) (dbl)1 0 0 (拦截) -0.03125589 0.33880599 -0.09225307 9.272958e-012 0 0 Var1 2.04967796 0.05534155 37.03687553 5.222878e-223 0 1 (拦截) -0.08877431 0.38932340 -0.22802203 8.223492e-014 0 1 Var1 3.97771680 0.07046498 56.44955828 8.756108e-215 1 0 (拦截) 0.02582533 0.28189331 0.09161384 9.275272e-016 1 0 Var1 2.99359832 0.04622495 64.76153226 4.902771e-387 1 1 (拦截) 0.16562985 0.55143596 0.30036100 7.675439e-018 1 1 Var1 14.95581348 0.07582089 197.25189807 5.275462e-30

此处的截距对应于由两个虚拟变量跨越的每个组中的均值(而不是该均值与您从完全交互回归模型中获得的总体均值的差异),以及 Var1对应每组的斜率系数,是Var1Dummy1Dummy2的每种可能组合的估计效果.>

注意fitVar1的系数与第2行估计的系数一一对应,以及Var1 对应值 Var1 + Var1:Dummy1.所以,你可以看到使用这种方法,你不需要手动添加变量.

要测试所有组的斜率系数是否相同,最适合您的初始回归模型.您只需检查 summary(fit) 并查看交互项是否重要.如果是,那就有区别了.如果不是,则没有区别.这将对应于顺序测试.要进行同时测试,您可以使用 F 测试,如

图书馆(汽车)线性假设(拟合,c(Var1:Dummy1",Var1:Dummy2",Var1:Dummy1:Dummy2"),详细=T,测试=F")

A) I am interested in the effects of a continuous variable (Var1) on a continuous dependent variable (DV) conditional on four different groups, which are defined by two bivariate variables (Dummy1 and Dummy2). I thus run a three-way interaction.

Var1 <- sample(0:10, 100, replace = T)
Dummy1 <- sample(c(0,1), 100, replace = T)
Dummy2 <- sample(c(0,1), 100, replace = T)

DV <-2*Var1 + Var1*Dummy1 + 2*Var1*Dummy2 + 10*Var1*Dummy1*Dummy2 + rnorm(100)

fit <- lm(DV ~ Var1*Dummy1*Dummy2)

I would like to compare coefficients of Var1 between the groups. I believe, this can be achieved by adding up the relevant coefficients.

# Group Dummy1 = 0 & Dummy 2 = 0: 
fit$coefficients[Var1]

# Group Dummy1 = 1 & Dummy 2 = 0: 
fit$coefficients[Var1] + fit$coefficients[Var1:Dummy1]

Yet this seems overly arduous and prone to error. What is a more efficient solution?

My desired output is the estimated effect of Var1 for each possible combination of Dummy1 and Dummy2.

B) Once I know the estimated effect-sizes of Var1 for each group, how can I test if any two are statistically different from each other? I assume the linearHypothesis() function could help, but I can't figure out how. Thanks!

解决方案

A fully interacted model is equivalent to running a regression on each subset of the data, so if your intention is indeed:

My desired output is the estimated effect of Var1 for each possible combination of Dummy1 and Dummy2.

Then the following may be helpful:

# get your data
set.seed(42)
Var1 <- sample(0:10, 100, replace = T)
Dummy1 <- sample(c(0,1), 100, replace = T)
Dummy2 <- sample(c(0,1), 100, replace = T)
DV <-2*Var1 + Var1*Dummy1 + 2*Var1*Dummy2 + 10*Var1*Dummy1*Dummy2 + rnorm(100)
df <- data.frame(DV, Var1, Dummy1, Dummy2)

First, note that

fit <- lm(DV ~ Var1*Dummy1*Dummy2)
fit$coefficients["Var1"]
    Var1 
2.049678 
fit$coefficients["Var1"] + fit$coefficients["Var1:Dummy1"]
    Var1 
2.993598 

Now, let us estimate the effects for each group combination:

library(dplyr)
library(broom)

df %>% group_by(Dummy1, Dummy2) %>% do(tidy(lm(DV ~ Var1, data=.)))

Source: local data frame [8 x 7]
Groups: Dummy1, Dummy2 [4]

  Dummy1 Dummy2        term    estimate  std.error    statistic      p.value
   (dbl)  (dbl)       (chr)       (dbl)      (dbl)        (dbl)        (dbl)
1      0      0 (Intercept) -0.03125589 0.33880599  -0.09225307 9.272958e-01
2      0      0        Var1  2.04967796 0.05534155  37.03687553 5.222878e-22
3      0      1 (Intercept) -0.08877431 0.38932340  -0.22802203 8.223492e-01
4      0      1        Var1  3.97771680 0.07046498  56.44955828 8.756108e-21
5      1      0 (Intercept)  0.02582533 0.28189331   0.09161384 9.275272e-01
6      1      0        Var1  2.99359832 0.04622495  64.76153226 4.902771e-38
7      1      1 (Intercept)  0.16562985 0.55143596   0.30036100 7.675439e-01
8      1      1        Var1 14.95581348 0.07582089 197.25189807 5.275462e-30

The intercept here corresponds to the means in each group spanned by the two dummy variables (as opposed to the difference of that mean to the overall mean which you get from the fully interacted regression model), and Var1 corresponds to the slope coefficient in each group, which is the estimated effect of Var1 for each possible combination of Dummy1 and Dummy2.

Note the one-to-one correspondence of the coefficient of Var1 in fit, and the coefficient estimated in row 2, as well as that the value of Var1 in the row 6 corresponds to the value Var1 + Var1:Dummy1. So, you can see that using this approach, you do not need to manually add up variables.

To test if the slope coefficient is identical across all groups, your initial regression model is best suited. You simply check summary(fit) and see if the interaction terms are significant. If they are, there is a difference. If they are not, there is no difference. This would correspond to a sequential test. To carry out a simultaneous test, you can use an F test, as in

library(car)
linearHypothesis(fit, c("Var1:Dummy1", "Var1:Dummy2", "Var1:Dummy1:Dummy2"), 
verbose=T, test="F")

这篇关于计算和比较每个组的回归交互的系数估计值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆