与"glmnet"一起使用的Ridge回归给出的系数与我通过“教科书定义"计算出的系数不同? [英] Ridge regression with `glmnet` gives different coefficients than what I compute by "textbook definition"?

查看：198 发布时间：2020/4/30 12:20:46 r machine-learning regression linear-regression glmnet

本文介绍了与"glmnet"一起使用的Ridge回归给出的系数与我通过“教科书定义"计算出的系数不同?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用glmnet R包运行Ridge回归.我注意到从glmnet::glmnet函数获得的系数与通过定义(使用相同的lambda值)计算系数获得的系数不同.有人可以解释一下为什么吗?

I am running Ridge regression with the use of glmnet R package. I noticed that the coefficients I obtain from glmnet::glmnet function are different from those I get by computing coefficients by definition (with the use of the same lambda value). Could somebody explain me why?

数据(均包括:响应Y和设计矩阵X)均已缩放.

Data (both: response Y and design matrix X) are scaled.

library(MASS)
library(glmnet)

# Data dimensions
p.tmp <- 100
n.tmp <- 100

# Data objects
set.seed(1)
X <- scale(mvrnorm(n.tmp, mu = rep(0, p.tmp), Sigma = diag(p.tmp)))
beta <- rep(0, p.tmp)
beta[sample(1:p.tmp, 10, replace = FALSE)] <- 10
Y.true <- X %*% beta
Y <- scale(Y.true + matrix(rnorm(n.tmp))) # Y.true + Gaussian noise

# Run glmnet 
ridge.fit.cv <- cv.glmnet(X, Y, alpha = 0)
ridge.fit.lambda <- ridge.fit.cv$lambda.1se

# Extract coefficient values for lambda.1se (without intercept)
ridge.coef <- (coef(ridge.fit.cv, s = ridge.fit.lambda))[2:(p.tmp+1)]

# Get coefficients "by definition"
ridge.coef.DEF <- solve(t(X) %*% X + ridge.fit.lambda * diag(p.tmp)) %*% t(X) %*% Y

# Plot estimates
plot(ridge.coef, type = "l", ylim = range(c(ridge.coef, ridge.coef.DEF)),
     main = "black: Ridge `glmnet`\nred: Ridge by definition")
lines(ridge.coef.DEF, col = "red")

推荐答案

如果阅读?glmnet，您将看到高斯响应的惩罚目标函数是:

If you read ?glmnet, you will see that the penalized objective function of Gaussian response is:

1/2 * RSS / nobs + lambda * penalty

如果使用山脊罚分1/2 * ||beta_j||_2^2，我们有

In case the ridge penalty 1/2 * ||beta_j||_2^2 is used, we have

1/2 * RSS / nobs + 1/2 * lambda * ||beta_j||_2^2

与

RSS + lambda * nobs * ||beta_j||_2^2

这与我们通常在教科书中看到的有关岭回归的情况不同:

RSS + lambda * ||beta_j||_2^2

您编写的公式:

##solve(t(X) %*% X + ridge.fit.lambda * diag(p.tmp)) %*% t(X) %*% Y
drop(solve(crossprod(X) + diag(ridge.fit.lambda, p.tmp), crossprod(X, Y)))

是教科书的成绩；对于glmnet，我们应该期望:

is for the textbook result; for glmnet we should expect:

##solve(t(X) %*% X + n.tmp * ridge.fit.lambda * diag(p.tmp)) %*% t(X) %*% Y
drop(solve(crossprod(X) + diag(n.tmp * ridge.fit.lambda, p.tmp), crossprod(X, Y)))

因此，教科书使用了惩罚最小二乘法，而glmnet使用了惩罚均方误差.

So, the textbook uses penalized least squares, but glmnet uses penalized mean squared error.

请注意，我没有将您的原始代码与t()，"%*%"和solve(A) %*% b一起使用；使用crossprod和solve(A, b)效率更高！参见后续部分.

Note I did not use your original code with t(), "%*%" and solve(A) %*% b; using crossprod and solve(A, b) is more efficient! See Follow-up section in the end.

现在让我们进行新的比较:

Now let's make a new comparison:

library(MASS)
library(glmnet)

# Data dimensions
p.tmp <- 100
n.tmp <- 100

# Data objects
set.seed(1)
X <- scale(mvrnorm(n.tmp, mu = rep(0, p.tmp), Sigma = diag(p.tmp)))
beta <- rep(0, p.tmp)
beta[sample(1:p.tmp, 10, replace = FALSE)] <- 10
Y.true <- X %*% beta
Y <- scale(Y.true + matrix(rnorm(n.tmp)))

# Run glmnet 
ridge.fit.cv <- cv.glmnet(X, Y, alpha = 0, intercept = FALSE)
ridge.fit.lambda <- ridge.fit.cv$lambda.1se

# Extract coefficient values for lambda.1se (without intercept)
ridge.coef <- (coef(ridge.fit.cv, s = ridge.fit.lambda))[-1]

# Get coefficients "by definition"
ridge.coef.DEF <- drop(solve(crossprod(X) + diag(n.tmp * ridge.fit.lambda, p.tmp), crossprod(X, Y)))

# Plot estimates
plot(ridge.coef, type = "l", ylim = range(c(ridge.coef, ridge.coef.DEF)),
     main = "black: Ridge `glmnet`\nred: Ridge by definition")
lines(ridge.coef.DEF, col = "red")

请注意，我在调用cv.glmnet(或glmnet)时设置了intercept = FALSE.这比实际影响要具有更多的概念意义.从概念上讲，我们的教科书计算没有截距，因此我们想在使用glmnet时删除截距.但是实际上，由于X和Y是标准化的，所以截距的理论估计为0.即使使用intercepte = TRUE(默认为glment)，也可以检查截距的估计是否为~e-17(数字0 )，因此其他系数的估算会受到显着影响.另一个答案就是显示这个.

Note that I have set intercept = FALSE when I call cv.glmnet (or glmnet). This has more conceptual meaning than what it will affect in practice. Conceptually, our textbook computation has no intercept, so we want to drop intercept when using glmnet. But practically, since your X and Y are standardized, the theoretical estimate of intercept is 0. Even with intercepte = TRUE (glment default), you can check that the estimate of intercept is ~e-17 (numerically 0), hence estimate of other coefficients is not notably affected. The other answer is just showing this.

跟进

关于crossprod和solve(A, b)的使用-很有趣！您是否偶然地对此进行了模拟比较?

As for the using crossprod and solve(A, b) - interesting! Do you by chance have any reference to simulation comparison for that?

t(X) %*% Y将首先进行转置X1 <- t(X)，然后执行X1 %*% Y，而crossprod(X, Y)将不进行转置. "%*%"是

t(X) %*% Y will first take transpose X1 <- t(X), then do X1 %*% Y, while crossprod(X, Y) will not do the transpose. "%*%" is a wrapper for DGEMM for case op(A) = A, op(B) = B, while crossprod is a wrapper for op(A) = A', op(B) = B. Similarly tcrossprod for op(A) = A, op(B) = B'.

crossprod(X)的主要用途是t(X) %*% X；与X %*% t(X)的tcrossprod(X)类似，在这种情况下 DSYRK 而不是DGEMM被调用.您可以阅读为什么内置lm函数在R中这么慢的第一部分?出于理性和基准.

A major use of crossprod(X) is for t(X) %*% X; similarly the tcrossprod(X) for X %*% t(X), in which case DSYRK instead of DGEMM is called. You can read the first section of Why the built-in lm function is so slow in R? for reason and a benchmark.

请注意，如果X不是方阵，则crossprod(X)和tcrossprod(X)的速度不一样，因为它们涉及不同数量的浮点运算，您可以阅读附带注意事项对于对称密集矩阵乘法，是否有比"tcrossprod"更快的R函数?

Be aware that if X is not a square matrix, crossprod(X) and tcrossprod(X) are not equally fast as they involve different amount of floating point operations, for which you may read the side notice of Any faster R function than "tcrossprod" for symmetric dense matrix multiplication?

关于solvel(A, b)和solve(A) %*% b，请阅读如何计算诊断的第一部分 X％％求解(A)％％t(X))，而无需求矩阵逆吗?

Regarding solvel(A, b) and solve(A) %*% b, please read the first section of How to compute diag(X %% solve(A) %% t(X)) efficiently without taking matrix inverse?

这篇关于与"glmnet"一起使用的Ridge回归给出的系数与我通过“教科书定义"计算出的系数不同?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

与"glmnet"一起使用的Ridge回归给出的系数与我通过“教科书定义"计算出的系数不同? [英] Ridge regression with `glmnet` gives different coefficients than what I compute by "textbook definition"?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

与"glmnet"一起使用的Ridge回归给出的系数与我通过“教科书定义"计算出的系数不同? [英] Ridge regression with `glmnet` gives different coefficients than what I compute by &quot;textbook definition&quot;?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

与"glmnet"一起使用的Ridge回归给出的系数与我通过“教科书定义"计算出的系数不同? [英] Ridge regression with `glmnet` gives different coefficients than what I compute by "textbook definition"?

登录关闭