提取插入符号中glmnet模型的最佳调整参数的系数 [英] Extract the coefficients for the best tuning parameters of a glmnet model in caret

查看:266
本文介绍了提取插入符号中glmnet模型的最佳调整参数的系数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用glmnet在插入符中运行弹性净正则化.

I am running elastic net regularization in caret using glmnet.

我将值序列传递给trainControl以获取alpha和lambda,然后执行repeatedcv以获取alpha和lambda的最佳调整.

I pass sequence of values to trainControl for alpha and lambda, then I perform repeatedcv to get the optimal tunings of alpha and lambda.

下面是一个示例,其中alpha和lambda的最佳调整分别为0.7和0.5:

Here is an example where the optimal tunings for alpha and lambda are 0.7 and 0.5 respectively:

age     <- c(4, 8, 7, 12, 6, 9, 10, 14, 7, 6, 8, 11, 11, 6, 2, 10, 14, 7, 12, 6, 9, 10, 14, 7) 
gender  <-  make.names(as.factor(c(1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1)))
bmi_p   <- c(0.86, 0.45, 0.99, 0.84, 0.85, 0.67, 0.91, 0.29, 0.88, 0.83, 0.48, 0.99, 0.80, 0.85,
         0.50, 0.91, 0.29, 0.88, 0.99, 0.84, 0.80, 0.85, 0.88, 0.99) 
m_edu   <- make.names(as.factor(c(0, 1, 1, 2, 2, 3, 2, 0, 1, 1, 0, 1, 2, 2, 1, 2, 0, 1, 1, 2, 2, 0 , 1, 0)))
p_edu   <-  make.names(as.factor(c(0, 2, 2, 2, 2, 3, 2, 0, 0, 0, 1, 2, 2, 1, 3, 2, 3, 0, 0, 2, 0, 1, 0, 1)))
f_color <-  make.names(as.factor(c("blue", "blue", "yellow", "red", "red", "yellow", 
                   "yellow", "red", "yellow","blue", "blue", "yellow", "red", "red", "yellow", 
                   "yellow", "red", "yellow", "yellow", "red", "blue", "yellow", "yellow", "red")))
asthma <-  make.names(as.factor(c(1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1)))
x <- data.frame(age, gender, bmi_p, m_edu, p_edu, f_color, asthma)

tuneGrid <- expand.grid(alpha = seq(0, 1, 0.05), lambda = seq(0, 0.5, 0.05))
fitControl <- trainControl(method = 'repeatedcv', number = 3, repeats = 5, classProbs = TRUE, summaryFunction = twoClassSummary) 

set.seed(1352)
model.test <- caret::train(asthma ~ age + gender + bmi_p + m_edu + p_edu + f_color, data = x, method = "glmnet", 
                       family = "binomial", trControl = fitControl, tuneGrid = tuneGrid, 
                       metric = "ROC")

model.test$bestTune

我的问题?

当我运行as.matrix(coef(model.test$finalModel))时,我假设会给我与最佳模型相对应的系数,我会得到100组不同的系数.

When I run as.matrix(coef(model.test$finalModel)) which I would assume give me the coefficients corresponding to the best model, I get 100 different sets of coefficients.

那么如何获得与最佳调整相对应的系数?

So how do I get the coefficients corresponding to the best tuning?

我已经看到了获得最佳模型的建议.coef(model.test$finalModel, model.test$bestTune$lambda)但是,这将返回NULL系数,并且在任何情况下,都只会返回与lambda相关的最佳调整,而与alpha无关.

I've seen this recommendation to get the best model coef(model.test$finalModel, model.test$bestTune$lambda) However, this returns NULL coefficients, and In any case, would only be returning the best tunings related to lambda, and not to alpha in addition.

在互联网上到处搜索后,现在我能找到的所有指向正确答案的方法是

After searching everywhere on the internet, all I can find now which points me in the direction of the correct answer is this blog post, which says that model.test$finalModel returns the model corresponding to the best alpha tuning, and coef(model.test$finalModel, model.caret$bestTune$lambda) returns the set of coefficients corresponding to the best values of lambda. If this is true then this is the answer to my question. However, as this is a single blog post, and I can't find anything else to back up this claim, I am still skeptical. Can anyone validate this claim that model.test$finalModel returns the model corresponding to the best alpha?? If so then this question would be solved. Thanks!

推荐答案

在玩了一些代码之后,我发现glmnet火车根据种子选择不同的lambda范围是很奇怪的.这是一个示例:

After a bit of playing with your code I find it very odd that glmnet train chooses different lambda ranges depending on the seed. Here is an example:

library(caret)
library(glmnet)
set.seed(13)
model.test <- caret::train(asthma ~ age + gender + bmi_p + m_edu + p_edu + f_color, data = x, method = "glmnet", 
                           family = "binomial", trControl = fitControl, tuneGrid = tuneGrid, 
                           metric = "ROC")

c(head(model.test$finalModel$lambda, 5), tail(model.test$finalModel$lambda, 5))
#output
 [1] 3.7796447301 3.4438715094 3.1379274562 2.8591626295 2.6051625017 0.0005483617 0.0004996468 0.0004552595 0.0004148155
[10] 0.0003779645

最佳lambda是:

model.test$finalModel$lambdaOpt
#output
#[1] 0.05

这可行:

coef(model.test$finalModel, model.test$finalModel$lambdaOpt)
#12 x 1 sparse Matrix of class "dgCMatrix"
                        1
(Intercept)   -0.03158974
age            0.03329806
genderX1      -1.24093677
bmi_p          1.65156913
m_eduX1        0.45314106
m_eduX2       -0.09934991
m_eduX3       -0.72360297
p_eduX1       -0.51949828
p_eduX2       -0.80063642
p_eduX3       -2.18231433
f_colorred     0.87618211
f_coloryellow -1.52699254

给出最佳的alpha和lambda系数

giving the coefficients at best alpha and lambda

使用此模型预测y时,某些y预测为X1,而某些y则为X2

when using this model to predict some y are predicted as X1 and some as X2

 [1] X1 X1 X0 X1 X1 X0 X0 X1 X1 X1 X0 X1 X1 X1 X0 X0 X0 X1 X1 X1 X1 X0 X1 X1
Levels: X0 X1

现在有了您使用的种子

set.seed(1352)
model.test <- caret::train(asthma ~ age + gender + bmi_p + m_edu + p_edu + f_color, data = x, method = "glmnet", 
                           family = "binomial", trControl = fitControl, tuneGrid = tuneGrid, 
                           metric = "ROC")

c(head(model.test$finalModel$lambda, 5), tail(model.test$finalModel$lambda, 5))
#output
 [1] 2.699746e-01 2.459908e-01 2.241377e-01 2.042259e-01 1.860830e-01 3.916870e-05 3.568906e-05 3.251854e-05 2.962968e-05
[10] 2.699746e-05

lambda值小10倍,由于lambdaOpt不在测试的lambda范围内,因此可以提供空系数:

lambda values are 10 times smaller and this gives empty coefficients since lambdaOpt is not in the range of tested lambda:

coef(model.test$finalModel, model.test$finalModel$lambdaOpt)
#output
12 x 1 sparse Matrix of class "dgCMatrix"
              1
(Intercept)   .
age           .
genderX1      .
bmi_p         .
m_eduX1       .
m_eduX2       .
m_eduX3       .
p_eduX1       .
p_eduX2       .
p_eduX3       .
f_colorred    .
f_coloryellow .

model.test$finalModel$lambdaOpt
#output
0.5

现在在此模型上进行预测时,仅预测X0(第一级):

now when predicting upon this model only X0 is predicted (the first level):

predict(model.test, x)
#output
 [1] X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0
Levels: X0 X1

非常奇怪的行为,可能值得举报

quite odd behavior, probably worth reporting

这篇关于提取插入符号中glmnet模型的最佳调整参数的系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆