如何在此线性模型中强制下降截距或等效截距? [英] How can I force dropping intercept or equivalent in this linear model?

查看:130
本文介绍了如何在此线性模型中强制下降截距或等效截距?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请参阅下表:

DB <- data.frame(
  Y =rnorm(6),
  X1=c(T, T, F, T, F, F),
  X2=c(T, F, T, F, T, T)
)
           Y    X1    X2
1  1.8376852  TRUE  TRUE
2 -2.1173739  TRUE FALSE
3  1.3054450 FALSE  TRUE
4 -0.3476706  TRUE FALSE
5  1.3219099 FALSE  TRUE
6  0.6781750 FALSE  TRUE

我想用没有截距的两个二进制变量(TRUE或FALSE)解释我的定量变量Y.

I'd like to explain my quantitative variable Y by two binary variables (TRUE or FALSE) without intercept.

此选择的论据是,在我的研究中,我们无法同时观察X1=FALSEX2=FALSE,因此对此除以0以外的平均值是没有意义的级别.

The argument of this choice is that, in my study, we can't observe X1=FALSE and X2=FALSE at the same time, so it doesn't make sense to have a mean, other than 0, for this level.

m1 <- lm(Y~X1+X2, data=DB)
summary(m1)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -1.9684     1.0590  -1.859   0.1600  
X1TRUE        0.7358     0.9032   0.815   0.4749  
X2TRUE        3.0702     0.9579   3.205   0.0491 *

没有拦截

m0 <- lm(Y~0+X1+X2, data=DB)
summary(m0)

Coefficients:
        Estimate Std. Error t value Pr(>|t|)  
X1FALSE  -1.9684     1.0590  -1.859   0.1600  
X1TRUE   -1.2325     0.5531  -2.229   0.1122  
X2TRUE    3.0702     0.9579   3.205   0.0491 *

我无法解释为什么为变量X1估计两个系数.似乎等于带有截距的模型中的截距系数.

I can't explain why two coefficients are estimated for the variable X1. It seems to be equivalent to the intercept coefficient in the model with intercept.

当我们显示所有变量组合的估计时,两个模型是相同的.

When we display the estimation for all the combinations of variables, the two models are the same.

DisplayLevel <- function(m){
  R <-  outer(
    unique(DB$X1),
    unique(DB$X2),
    function(a, b) predict(m,data.frame(X1=a, X2=b))
  )
  colnames(R) <- paste0('X2:', unique(DB$X2))
  rownames(R) <- paste0('X1:', unique(DB$X1))
  return(R)
}

DisplayLevel(m1)
          X2:TRUE  X2:FALSE
X1:TRUE  1.837685 -1.232522
X1:FALSE 1.101843 -1.968364

DisplayLevel(m0)
          X2:TRUE  X2:FALSE
X1:TRUE  1.837685 -1.232522
X1:FALSE 1.101843 -1.968364

所以这两个模型是等效的.

So the two models are equivalent.

我的问题是:我们可以只为第一个效果估算一个系数吗?我们可以强制R为组合X1=FALSEX2=FALSE分配一个0值吗?

My question is : can we just estimate one coefficient for the first effect ? Can we force R to assign a 0 value to the combinations X1=FALSE and X2=FALSE ?

推荐答案

是的,我们可以通过

DB <- as.data.frame(data.matrix(DB))
## or you can do:
## DB$X1 <- as.integer(DB$X1)
## DB$X2 <- as.integer(DB$X2)

#            Y X1 X2
# 1 -0.5059575  1  1
# 2  1.3430388  1  0
# 3 -0.2145794  0  1
# 4 -0.1795565  1  0
# 5 -0.1001907  0  1
# 6  0.7126663  0  1

## a linear model without intercept
m0 <- lm(Y ~ 0 + X1 + X2, data = DB)

DisplayLevel(m0)
#             X2:1      X2:0
# X1:1  0.15967744 0.2489237
# X1:0 -0.08924625 0.0000000

我已将您的TRUE/FALSE二进制文件明确地强制转换为数字1/0,以便lm()不会处理任何对比度.

I have explicitly coerced your TRUE/FALSE binary into numeric 1/0, so that no contrast is handled by lm().

出现在我的答案中的数据与您的不同,因为您没有在rnorm()之前使用set.seed(?)进行再现.但这不是问题.

The data appeared in my answer are different to yours, because you did not use set.seed(?) before rnorm() for reproducibility. But this is not a issue here.

这篇关于如何在此线性模型中强制下降截距或等效截距?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆