具有glmnet和lm的普通最小二乘法 [英] Ordinary least squares with glmnet and lm

查看:285
本文介绍了具有glmnet和lm的普通最小二乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

stackoverflow.com/q/38378118 中提出了这个问题,但没有令人满意的答案.

This question was asked in stackoverflow.com/q/38378118 but there was no satisfactory answer.

λ= 0的LASSO等效于普通最小二乘,但是对于R中的glmnet()lm()似乎不是这种情况.为什么?

LASSO with λ = 0 is equivalent to ordinary least squares, but this does not seem to be the case for glmnet() and lm() in R. Why?

library(glmnet)
options(scipen = 999)

X = model.matrix(mpg ~ 0 + ., data = mtcars)
y = as.matrix(mtcars["mpg"])
coef(glmnet(X, y, lambda = 0))
lm(y ~ X)

他们的回归系数最多相符2个有效数字,这可能是由于其优化算法的终止条件略有不同所致:

Their regression coefficients agree by at most 2 significant figures, perhaps due to slightly different termination conditions of their optimization algorithms:

                  glmnet        lm
(Intercept)  12.19850081  12.30337
cyl          -0.09882217  -0.11144
disp          0.01307841   0.01334
hp           -0.02142912  -0.02148
drat          0.79812453   0.78711
wt           -3.68926778  -3.71530
qsec          0.81769993   0.82104
vs            0.32109677   0.31776
am            2.51824708   2.52023
gear          0.66755681   0.65541
carb         -0.21040602  -0.19942

当添加交互条件时,差异会更糟.

The difference is much worse when we add interaction terms.

X = model.matrix(mpg ~ 0 + . + . * disp, data = mtcars)
y = as.matrix(mtcars["mpg"])
coef(glmnet(X, y, lambda = 0))
lm(y ~ X)

回归系数:

                     glmnet           lm
(Intercept)   36.2518682237  139.9814651
cyl          -11.9551206007  -26.0246050
disp          -0.2871942149   -0.9463428
hp            -0.1974440651   -0.2620506
drat          -4.0209186383  -10.2504428
wt             1.3612184380    5.4853015
qsec           2.3549189212    1.7690334
vs           -25.7384282290  -47.5193122
am           -31.2845893123  -47.4801206
gear          21.1818220135   27.3869365
carb           4.3160891408    7.3669904
cyl:disp       0.0980253873    0.1907523
disp:hp        0.0006066105    0.0006556
disp:drat      0.0040336452    0.0321768
disp:wt       -0.0074546428   -0.0228644
disp:qsec     -0.0077317305   -0.0023756
disp:vs        0.2033046078    0.3636240
disp:am        0.2474491353    0.3762699
disp:gear     -0.1361486900   -0.1963693
disp:carb     -0.0156863933   -0.0188304

推荐答案

如果您查看这些帖子,您将对为何未获得相同结果的感觉有所了解.

If you check out these two posts, you will get a sense as to why you are not getting the same results.

本质上,glmnet使用正则化路径估计模型来惩罚最大似然. lm使用QR分解解决最小二乘问题.因此,估算值将永远不会完全相同.

In essence, glmnet penalized maximum likelihood using a regularization path to estimate the model. lm solves the least squares problem using QR decomposition. So the estimates will never be exactly the same.

但是,请注意手册中"lambda"下的?glmnet:

However, note in the manual for ?glmnet under "lambda":

警告:小心使用.不要为Lambda提供单个值(对于 CV之后的预测则改用predict().供应代替 Lambda值的递减顺序. glmnet依靠温暖 从速度开始,它通常比整个路径更快 计算一次拟合.

WARNING: use with care. Do not supply a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.

您可以(至少)做三件事来使系数更接近,因此差值很小-(1)的值范围为lambda,(2)降低阈值thres,并且( 3)增加最大迭代次数.

You can do (at least) three things to get the coefficients closer so the difference is trivial--(1) have a range of values for lambda, (2) decrease the threshold value thres, and (3) increase the max number of iterations.

library(glmnet)
options(scipen = 999)

X = model.matrix(mpg ~ 0 + ., data = mtcars)
y = as.matrix(mtcars["mpg"])
lfit <- glmnet(X, y, lambda = rev(0:99), thres = 1E-10)
lmfit <- lm(y ~ X)
coef(lfit, s = 0) - coef(lmfit)
11 x 1 Matrix of class "dgeMatrix"
                          1
(Intercept)  0.004293053125
cyl         -0.000361655351
disp        -0.000002631747
hp           0.000006447138
drat        -0.000065394578
wt           0.000180943607
qsec        -0.000079480187
vs          -0.000462099248
am          -0.000248796353
gear        -0.000222035415
carb        -0.000071164178


X = model.matrix(mpg ~ 0 + . + . * disp, data = mtcars)
y = as.matrix(mtcars["mpg"])
lfit <- glmnet(X, y, lambda = rev(0:99), thres = 1E-12, maxit = 10^7)
lmfit <- glm(y ~ X)
coef(lfit, s = 0) - coef(lmfit)
 20 x 1 Matrix of class "dgeMatrix"
                           1
(Intercept) -0.3174019115228
cyl          0.0414909318817
disp         0.0020032493403
hp           0.0001834076765
drat         0.0188376047769
wt          -0.0120601219002
qsec         0.0019991131315
vs           0.0636756040430
am           0.0439343002375
gear        -0.0161102501755
carb        -0.0088921918062
cyl:disp    -0.0002714213271
disp:hp     -0.0000001211365
disp:drat   -0.0000859742667
disp:wt      0.0000462418947
disp:qsec   -0.0000175276420
disp:vs     -0.0004657059892
disp:am     -0.0003517289096
disp:gear    0.0001629963377
disp:carb    0.0000085312911

交互模型的某些差异可能并不重要,但更接近.

Some of the differences for the interacted model are probably non-trivial, but closer.

这篇关于具有glmnet和lm的普通最小二乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆