cv.glmnet是否使用完整的lambda序列来过度拟合数据？ [英] Is cv.glmnet overfitting the the data by using the full lambda sequence?

查看：471 发布时间：2020/10/11 19:59:50 r statistics cross-validation glmnet

本文介绍了cv.glmnet是否使用完整的lambda序列来过度拟合数据？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

cv.glmnet 已被大多数研究论文和公司使用。为 glmnet.cr 构建类似的功能，例如 cv.glmnet （类似的程序包，实现套索的连续比率序数回归）我在 cv.glmnet 中遇到了这个问题。

cv.glmnet has been used by most research papers and companies. While building a similar function like cv.glmnet for glmnet.cr (a similar package that implements the lasso for continuation ratio ordinal regression) I came across this problem in cv.glmnet.

`cv.glmnet` first fits the model:



glmnet.object = glmnet(x, y, weights = weights, offset = offset, 
                     lambda = lambda, ...)

在glmnet 使用完整的数据创建对象，下一步如下：从拟合的完整模型中提取 lambda


After the glmnet object is created with the complete data, the next step goes as follows:
The lambda from the complete model fitted is extracted
lambda = glmnet.object$lambda

现在，他们可以确保折数大于3 
Now they make sure number of folds is more than 3
if (nfolds < 3) 
stop("nfolds must be bigger than 3; nfolds=10 recommended")

创建了一个列表来存储交叉验证结果
A list is created to store cross validated results
outlist = as.list(seq(nfolds))

 A  for循环根据交叉验证理论运行以适合不同的数据部分
A for loop runs to fit different data parts per the theory of cross-validation
  for (i in seq(nfolds)) {
    which = foldid == i
    if (is.matrix(y)) 
      y_sub = y[!which, ]
    else y_sub = y[!which]
    if (is.offset) 
      offset_sub = as.matrix(offset)[!which, ]
    else offset_sub = NULL
#using the lambdas for the complete data 
    outlist[[i]] = glmnet(x[!which, , drop = FALSE], 
                          y_sub, lambda = lambda, offset = offset_sub, 
                          weights = weights[!which], ...)
  }
}

那么会发生什么。将数据拟合为完整数据后，将使用完整数据中的lambda进行交叉验证。有人可以告诉我这怎么可能不是数据过拟合？我们进行交叉验证时，希望模型没有有关数据遗漏部分的信息。但是 cv.glmnet 对此作弊！
So what happens. After fitting the data to the complete data, cross-validation is done, with lambdas from the complete data. Can someone tell me how this can possibly not be data over-fitting?. We in cross-validation want the model to have no information about the left out part  of  the data. But cv.glmnet cheats on this!
推荐答案
不，这是
  cv.glmnet（）确实为lambda序列构建了完整的求解路径。但是您永远不会选择该路径中的最后一个条目。您通常选择 lambda == lambda.1se （或 lambda.min ），如@Fabians所说：
cv.glmnet() does build the entire solution path for the lambda sequence. But you never pick the last entry in that path. You typically pick lambda==lambda.1se (or lambda.min) , as @Fabians said:
lambda==lambda.min : is the lambda-value where cvm is minimized

lambda==lambda.1se : is the lambda-value where (cvm-cvsd)=cvlow is minimized. This is your optimal lambda

请参见 cv.glmnet（）和 coef（...，s ='lambda.1se'） 

                        这篇关于cv.glmnet是否使用完整的lambda序列来过度拟合数据？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

cv.glmnet是否使用完整的lambda序列来过度拟合数据？ [英] Is cv.glmnet overfitting the the data by using the full lambda sequence?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

cv.glmnet是否使用完整的lambda序列来过度拟合数据？ [英] Is cv.glmnet overfitting the the data by using the full lambda sequence?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭