如何在此线性模型中强制丢弃截距或等效项? [英] How can I force dropping intercept or equivalent in this linear model?

查看:25
本文介绍了如何在此线性模型中强制丢弃截距或等效项?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑下表:

DB <- data.frame(
  Y =rnorm(6),
  X1=c(T, T, F, T, F, F),
  X2=c(T, F, T, F, T, T)
)
           Y    X1    X2
1  1.8376852  TRUE  TRUE
2 -2.1173739  TRUE FALSE
3  1.3054450 FALSE  TRUE
4 -0.3476706  TRUE FALSE
5  1.3219099 FALSE  TRUE
6  0.6781750 FALSE  TRUE

我想用两个没有截距的二元变量(TRUE 或 FALSE)来解释我的定量变量 Y.

I'd like to explain my quantitative variable Y by two binary variables (TRUE or FALSE) without intercept.

这个选择的论点是,在我的研究中,我们不能同时观察到X1=FALSEX2=FALSE,所以它不对于这个水平,有一个不为 0 的均值是有意义的.

The argument of this choice is that, in my study, we can't observe X1=FALSE and X2=FALSE at the same time, so it doesn't make sense to have a mean, other than 0, for this level.

m1 <- lm(Y~X1+X2, data=DB)
summary(m1)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -1.9684     1.0590  -1.859   0.1600  
X1TRUE        0.7358     0.9032   0.815   0.4749  
X2TRUE        3.0702     0.9579   3.205   0.0491 *

无拦截

m0 <- lm(Y~0+X1+X2, data=DB)
summary(m0)

Coefficients:
        Estimate Std. Error t value Pr(>|t|)  
X1FALSE  -1.9684     1.0590  -1.859   0.1600  
X1TRUE   -1.2325     0.5531  -2.229   0.1122  
X2TRUE    3.0702     0.9579   3.205   0.0491 *

我无法解释为什么为变量 X1 估计了两个系数.好像等价于有截距的模型中的截距系数.

I can't explain why two coefficients are estimated for the variable X1. It seems to be equivalent to the intercept coefficient in the model with intercept.

当我们显示所有变量组合的估计时,两个模型是相同的.

When we display the estimation for all the combinations of variables, the two models are the same.

DisplayLevel <- function(m){
  R <-  outer(
    unique(DB$X1),
    unique(DB$X2),
    function(a, b) predict(m,data.frame(X1=a, X2=b))
  )
  colnames(R) <- paste0('X2:', unique(DB$X2))
  rownames(R) <- paste0('X1:', unique(DB$X1))
  return(R)
}

DisplayLevel(m1)
          X2:TRUE  X2:FALSE
X1:TRUE  1.837685 -1.232522
X1:FALSE 1.101843 -1.968364

DisplayLevel(m0)
          X2:TRUE  X2:FALSE
X1:TRUE  1.837685 -1.232522
X1:FALSE 1.101843 -1.968364

所以这两个模型是等价的.

So the two models are equivalent.

我的问题是:我们可以只估计第一个效应的一个系数吗?我们可以强制 R 为 X1=FALSEX2=FALSE 组合分配一个 0 值吗?

My question is : can we just estimate one coefficient for the first effect ? Can we force R to assign a 0 value to the combinations X1=FALSE and X2=FALSE ?

推荐答案

是的,我们可以,通过

DB <- as.data.frame(data.matrix(DB))
## or you can do:
## DB$X1 <- as.integer(DB$X1)
## DB$X2 <- as.integer(DB$X2)

#            Y X1 X2
# 1 -0.5059575  1  1
# 2  1.3430388  1  0
# 3 -0.2145794  0  1
# 4 -0.1795565  1  0
# 5 -0.1001907  0  1
# 6  0.7126663  0  1

## a linear model without intercept
m0 <- lm(Y ~ 0 + X1 + X2, data = DB)

DisplayLevel(m0)
#             X2:1      X2:0
# X1:1  0.15967744 0.2489237
# X1:0 -0.08924625 0.0000000

我已经明确地将您的 TRUE/FALSE 二进制文件强制转换为数字 1/0,这样 lm() 就不会处理对比.

I have explicitly coerced your TRUE/FALSE binary into numeric 1/0, so that no contrast is handled by lm().

我的答案中出现的数据与您的不同,因为您没有在 rnorm() 之前使用 set.seed(?) 来实现可重复性.但这在这里不是问题.

The data appeared in my answer are different to yours, because you did not use set.seed(?) before rnorm() for reproducibility. But this is not a issue here.

这篇关于如何在此线性模型中强制丢弃截距或等效项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆