与使用lm求解法线方程可得出不同的系数? [英] Solving normal equation gives different coefficients from using `lm`?

查看:88
本文介绍了与使用lm求解法线方程可得出不同的系数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用lm和普通矩阵代数计算一个简单的回归.但是,我从矩阵代数获得的回归系数仅为使用lm获得的回归系数的一半,我不知道为什么.

I wanted to compute a simple regression using the lm and plain matrix algebra. However, my regression coefficients obtained from matrix algebra are only half of those obtained from using the lm and I have no clue why.

这是代码

boot_example <- data.frame(
  x1 = c(1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L),
  x2 = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L),
  x3 = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L),
  x4 = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L),
  x5 = c(1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L),
  x6 = c(0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L),
  preference_rating = c(9L, 7L, 5L, 6L, 5L, 6L, 5L, 7L, 6L)
  )
dummy_regression <- lm("preference_rating ~ x1+x2+x3+x4+x5+x6", data = boot_example)
dummy_regression

Call:
lm(formula = "preference_rating ~ x1+x2+x3+x4+x5+x6", data = boot_example)

Coefficients:
(Intercept)           x1           x2           x3           x4           x5           x6  
     4.2222       1.0000      -0.3333       1.0000       0.6667       2.3333       1.3333 

###The same by matrix algebra
X <- matrix(c(
c(1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), #upper var
c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L), #upper var
c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L), #country var
c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L), #country var
c(1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), #price var
c(0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L) #price var
), 
nrow = 9, ncol=6)

Y <- c(9L, 7L, 5L, 6L, 5L, 6L, 5L, 7L, 6L)

#Using standardized (mean=0, std=1) "z" -transformation Z = (X-mean(X))/sd(X) for all predictors
X_std <- apply(X, MARGIN = 2, FUN = function(x){(x-mean(x))/sd(x)})

##If constant shall be computed as well, uncomment next line 
#X_std <- cbind(c(rep(1,9)),X_std)

#using matrix algebra formula
solve(t(X_std) %*% X_std) %*% (t(X_std) %*% Y)

           [,1]
[1,]  0.5000000
[2,] -0.1666667
[3,]  0.5000000
[4,]  0.3333333
[5,]  1.1666667
[6,]  0.6666667

有人在我的矩阵计算中看到错误吗?

Does anyone see the error in my matrix computation?

提前谢谢!

推荐答案

lm未执行标准化.如果要通过lm获得相同的结果,则需要:

lm is not performing standardization. If you want to obtain the same result by lm, you need:

X1 <- cbind(1, X)  ## include intercept

solve(crossprod(X1), crossprod(X1,Y))

#           [,1]
#[1,]  4.2222222
#[2,]  1.0000000
#[3,] -0.3333333
#[4,]  1.0000000
#[5,]  0.6666667
#[6,]  2.3333333
#[7,]  1.3333333

我不想重复,我们应该使用crossprod.请参阅使用glmnet的里奇回归得到的系数与我通过教科书定义"计算出的系数不同吗? a>

I don't want to repeat that we should use crossprod. See the "follow-up" part of Ridge regression with glmnet gives different coefficients than what I compute by "textbook definition"?

这篇关于与使用lm求解法线方程可得出不同的系数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆