为什么在套索回归中计算MSE会得出不同的输出? [英] Why calculating MSE in lasso regression gives different outputs?

查看:184
本文介绍了为什么在套索回归中计算MSE会得出不同的输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对来自lasso2软件包的前列腺癌数据运行不同的回归模型.使用套索时,我看到了两种不同的方法来计算均方误差.但是它们的确给了我截然不同的结果,所以我想知道我做错了什么,还是仅仅意味着一种方法优于另一种方法?

I am trying to run different regression models on the Prostate cancer data from the lasso2 package. When I use Lasso, I saw two different methods to calculate the mean square error. But they do give me quite different results, so I would want to know if I'm doing anything wrong or if it just means that one method is better than the other ?

# Needs the following R packages.
library(lasso2)
library(glmnet)

# Gets the prostate cancer dataset
data(Prostate)

# Defines the Mean Square Error function 
mse = function(x,y) { mean((x-y)^2)}

# 75% of the sample size.
smp_size = floor(0.75 * nrow(Prostate))

# Sets the seed to make the partition reproductible.
set.seed(907)
train_ind = sample(seq_len(nrow(Prostate)), size = smp_size)

# Training set
train = Prostate[train_ind, ]

# Test set
test = Prostate[-train_ind, ]

# Creates matrices for independent and dependent variables.
xtrain = model.matrix(lpsa~. -1, data = train)
ytrain = train$lpsa
xtest = model.matrix(lpsa~. -1, data = test)
ytest = test$lpsa

# Fitting a linear model by Lasso regression on the "train" data set
pr.lasso = cv.glmnet(xtrain,ytrain,type.measure='mse',alpha=1)
lambda.lasso = pr.lasso$lambda.min

# Getting predictions on the "test" data set and calculating the mean     square error
lasso.pred = predict(pr.lasso, s = lambda.lasso, newx = xtest) 

# Calculating MSE via the mse function defined above
mse.1 = mse(lasso.pred,ytest)
cat("MSE (method 1): ", mse.1, "\n")

# Calculating MSE via the cvm attribute inside the pr.lasso object
mse.2 = pr.lasso$cvm[pr.lasso$lambda == lambda.lasso]
cat("MSE (method 2): ", mse.2, "\n")

这些是我为这两个MSE获得的输出:

So these are the outputs I got for both MSE:

MSE (method 1): 0.4609978 
MSE (method 2): 0.5654089 

它们是完全不同的.有人知道为什么吗? 预先感谢您的帮助!

And they're quite different. Does anyone know why ? Thanks a lot in advance for your help!

塞缪尔

推荐答案

@alistaire指出,在第一种情况下,您正在使用测试数据来计算MSE,在第二种情况下,您是通过交叉验证来计算MSE (训练)褶皱的报道,所以这不是一个苹果对苹果的比较.

As pointed out by @alistaire, in the first case you are using the test data to compute the MSE, in the second case the MSE from the cross-validation (training) folds are reported, so it's not an apples to apples comparison.

我们可以执行以下操作来进行苹果与苹果的比较(通过将拟合值保留在训练倍数上),并且如我们所见,如果在相同的训练倍数上计算,mse.1和mse.2完全相等(尽管该值与您的值几乎没有什么不同,但使用我的桌面R版本3.1.2,x86_64-w64-mingw32,Windows 10):

We can do something like the following to do apples to apples comparison (by keeping the fitted values on the training folds) and as we can see mse.1 and mse.2 are exactly equal if computed on the same training folds (although the value is little different from yours, with my desktop R version 3.1.2, x86_64-w64-mingw32, windows 10):

# Needs the following R packages.
library(lasso2)
library(glmnet)

# Gets the prostate cancer dataset
data(Prostate)

# Defines the Mean Square Error function 
mse = function(x,y) { mean((x-y)^2)}

# 75% of the sample size.
smp_size = floor(0.75 * nrow(Prostate))

# Sets the seed to make the partition reproductible.
set.seed(907)
train_ind = sample(seq_len(nrow(Prostate)), size = smp_size)

# Training set
train = Prostate[train_ind, ]

# Test set
test = Prostate[-train_ind, ]

# Creates matrices for independent and dependent variables.
xtrain = model.matrix(lpsa~. -1, data = train)
ytrain = train$lpsa
xtest = model.matrix(lpsa~. -1, data = test)
ytest = test$lpsa

# Fitting a linear model by Lasso regression on the "train" data set
# keep the fitted values on the training folds
pr.lasso = cv.glmnet(xtrain,ytrain,type.measure='mse', keep=TRUE, alpha=1)
lambda.lasso = pr.lasso$lambda.min
lambda.id <- which(pr.lasso$lambda == pr.lasso$lambda.min)

# get the predicted values on the training folds with lambda.min (not from test data)
mse.1 = mse(pr.lasso$fit[,lambda.id], ytrain) 
cat("MSE (method 1): ", mse.1, "\n")

MSE (method 1):  0.6044496 

# Calculating MSE via the cvm attribute inside the pr.lasso object
mse.2 = pr.lasso$cvm[pr.lasso$lambda == lambda.lasso]
cat("MSE (method 2): ", mse.2, "\n")

MSE (method 2):  0.6044496 

mse.1 == mse.2
[1] TRUE

这篇关于为什么在套索回归中计算MSE会得出不同的输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆