如何在 mlr3 中重复 glmnet 的超参数调整(alpha 和/或 lambda) [英] how to repeat hyperparameter tuning (alpha and/or lambda) of glmnet in mlr3
问题描述
我想在 mlr3
中重复 glmnet
的超参数调整(alpha
和/或 lambda
)避免可变性 在较小的数据集中
在caret
中,我可以用"repeatedcv"
因为我真的很喜欢 mlr3
系列包,所以我想用它们进行分析.但是,我不确定如何在 mlr3
示例数据
#library图书馆(插入符号)图书馆(mlr3verse)图书馆(mlbench)# 获取示例数据数据(PimaIndiansDiabetes,包=mlbench")数据 <- PimaIndiansDiabetes# 获取小的训练数据train.data <- 数据[1:60,]
由 reprex 包 (v1.0.0) 于 2021 年 3 月 18 日创建上>
caret
方法(调整 alpha
和 lambda
)使用 cv"
和 >"repeatedcv"
由 reprex 包 (v1.0.0) 于 2021 年 3 月 18 日创建上>
用交叉验证证明不同的系数,用重复交叉验证证明相同的系数
# 带有cv";我得到不同的系数相同(coef1,coef2)#>[1] 错误# 带有repeatedcv"我得到相同的系数相同(coef3,coef4)#>[1] 真
由 reprex 包 (v1.0.0) 于 2021 年 3 月 18 日创建上>
第一个 mlr3
方法使用 cv.glmnet
(在内部调整 lambda
)
# 创建弹性网络回归glmnet_lrn = lrn("classif.cv_glmnet", predict_type = "prob")# 定义训练任务train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")# 创建学习者学习者 = as_learner(glmnet_lrn)# 用不同的 set.seed 训练学习器set.seed(2323)学习者$train(train.task)coef(learner$model, s = "lambda.min") ->系数1set.seed(23)学习者$train(train.task)coef(learner$model, s = "lambda.min") ->系数2
由 reprex 包 (v1.0.0) 于 2021 年 3 月 18 日创建上>
用交叉验证证明不同的系数
# 比较系数系数1#>dgCMatrix"类的 9 x 1 稀疏矩阵#>1#>(拦截)-3.323460895#>年龄 0.005065928#>葡萄糖 0.019727881#>胰岛素.#>大量的 .#>血统.#>怀孕 0.001290570#>压力 .#>肱三头肌 0.020529162系数2#>dgCMatrix"类的 9 x 1 稀疏矩阵#>1#>(拦截)-3.146190752#>年龄 0.003840963#>葡萄糖 0.019015433#>胰岛素.#>大量的 .#>血统.#>怀孕了.#>压力 .#>肱三头肌 0.018841557
由 reprex 包 (v1.0.0) 于 2021 年 3 月 18 日创建上>
更新 1:我取得的进展
根据下面的评论和 此评论 我可以使用 rsmp
和AutoTuner
这个 answer 建议不要调整 cv.glmnet
但glmnet
(当时在 ml3 中没有)
使用 第二个 我如何证明我的第二种方法是有效的,并且我使用不同的种子获得相同或相似的系数?IE.如何提取 glmnet
的 mlr3
方法(重复 alpha
和 lambda
的调整)强>#定义训练任务train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")# 创建弹性网络回归glmnet_lrn = lrn("classif.glmnet", predict_type = "prob")# 转向学习者学习者 = as_learner(glmnet_lrn)# 创建搜索空间搜索空间 = ps(阿尔法= p_dbl(下= 0,上= 1),s = p_dbl(下= 1,上= 1))# 设置终止符终止符 = trm(evals", n_evals = 20)#设置调谐器调谐器 = tnr(grid_search",分辨率 = 3)# 调整学习器at = AutoTuner$new(学习者 = 学习者,rsmp(repeated_cv"),measure = msr(classif.ce"),搜索空间 = 搜索空间,终结者 = 终结者,调谐器=调谐器)在#><AutoTuner:classif.glmnet.tuned>#>* 模型: -#>* 参数:list()#>* 包:glmnet#>* 预测类型:概率#>* 特征类型:逻辑、整数、数字#>* 属性:多类、双类、权重
未解决的问题
AutoTuner
set.seed(23)at$train(train.task) ->调1set.seed(2323)at$train(train.task) ->调2
由 reprex 包 (v1.0.0) 于 2021 年 3 月 18 日创建上>
glmnet
的重复超参数调整(alpha 和 lambda)可以使用 SECOND mlr3
完成> 方法 如上所述.可以使用 stats::coef
提取系数和 AutoTuner
coef(tune1$model$learner$model, alpha=tune1$tuning_result$alpha,s=tune1$tuning_result$s)# 9 x 1 类dgCMatrix"的稀疏矩阵;#1#(拦截)-1.6359082102# 年龄 0.0075541841# 葡萄糖 0.0044351365#胰岛素 0.0005821515# 质量 0.0077104934# 谱系 0.0911233031# 怀孕 0.0164721202# 压力 0.0007055435# 三头肌 0.0056942014coef(tune2$model$learner$model,alpha=tune2$tuning_result$alpha,s=tune2$tuning_result$s)# 9 x 1 类dgCMatrix"的稀疏矩阵;#1#(拦截)-1.6359082102# 年龄 0.0075541841# 葡萄糖 0.0044351365#胰岛素 0.0005821515# 质量 0.0077104934# 谱系 0.0911233031# 怀孕 0.0164721202# 压力 0.0007055435# 三头肌 0.0056942014
I would like to repeat the hyperparameter tuning (alpha
and/or lambda
) of glmnet
in mlr3
to avoid variability in smaller data sets
In caret
, I could do this with "repeatedcv"
Since I really like the mlr3
family packages I would like to use them for my analysis. However, I am not sure about the correct way how to do this step in mlr3
Example data
#library
library(caret)
library(mlr3verse)
library(mlbench)
# get example data
data(PimaIndiansDiabetes, package="mlbench")
data <- PimaIndiansDiabetes
# get small training data
train.data <- data[1:60,]
Created on 2021-03-18 by the reprex package (v1.0.0)
caret
approach (tuning alpha
and lambda
) using "cv"
and "repeatedcv"
trControlCv <- trainControl("cv",
number = 5,
classProbs = TRUE,
savePredictions = TRUE,
summaryFunction = twoClassSummary)
# use "repeatedcv" to avoid variability in smaller data sets
trControlRCv <- trainControl("repeatedcv",
number = 5,
repeats= 20,
classProbs = TRUE,
savePredictions = TRUE,
summaryFunction = twoClassSummary)
# train and extract coefficients with "cv" and different set.seed
set.seed(2323)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlCv,
tuneLength = 10,
metric="ROC"
)
coef(model$finalModel, model$finalModel$lambdaOpt) -> coef1
set.seed(23)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlCv,
tuneLength = 10,
metric="ROC"
)
coef(model$finalModel, model$finalModel$lambdaOpt) -> coef2
# train and extract coefficients with "repeatedcv" and different set.seed
set.seed(13)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlRCv,
tuneLength = 10,
metric="ROC"
)
coef(model$finalModel, model$finalModel$lambdaOpt) -> coef3
set.seed(55)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlRCv,
tuneLength = 10,
metric="ROC"
)
coef(model$finalModel, model$finalModel$lambdaOpt) -> coef4
Created on 2021-03-18 by the reprex package (v1.0.0)
Demonstrate different coefficients with cross-validation and same coefficients with repeated cross-validation
# with "cv" I get different coefficients
identical(coef1, coef2)
#> [1] FALSE
# with "repeatedcv" I get the same coefficients
identical(coef3,coef4)
#> [1] TRUE
Created on 2021-03-18 by the reprex package (v1.0.0)
FIRST mlr3
approach using cv.glmnet
(does internally tune lambda
)
# create elastic net regression
glmnet_lrn = lrn("classif.cv_glmnet", predict_type = "prob")
# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")
# create learner
learner = as_learner(glmnet_lrn)
# train the learner with different set.seed
set.seed(2323)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef1
set.seed(23)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef2
Created on 2021-03-18 by the reprex package (v1.0.0)
Demonstrate different coefficients with cross-validation
# compare coefficients
coef1
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#> 1
#> (Intercept) -3.323460895
#> age 0.005065928
#> glucose 0.019727881
#> insulin .
#> mass .
#> pedigree .
#> pregnant 0.001290570
#> pressure .
#> triceps 0.020529162
coef2
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#> 1
#> (Intercept) -3.146190752
#> age 0.003840963
#> glucose 0.019015433
#> insulin .
#> mass .
#> pedigree .
#> pregnant .
#> pressure .
#> triceps 0.018841557
Created on 2021-03-18 by the reprex package (v1.0.0)
Update 1: the progress I made
According to the comment below and this comment I could use rsmp
and
AutoTuner
This answer suggests not to tune cv.glmnet
but glmnet
(which was not available in ml3 at that time)
SECOND mlr3
approach using glmnet
(repeats the tuning of alpha
and lambda
)
# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")
# create elastic net regression
glmnet_lrn = lrn("classif.glmnet", predict_type = "prob")
# turn to learner
learner = as_learner(glmnet_lrn)
# make search space
search_space = ps(
alpha = p_dbl(lower = 0, upper = 1),
s = p_dbl(lower = 1, upper = 1)
)
# set terminator
terminator = trm("evals", n_evals = 20)
#set tuner
tuner = tnr("grid_search", resolution = 3)
# tune the learner
at = AutoTuner$new(
learner = learner,
rsmp("repeated_cv"),
measure = msr("classif.ce"),
search_space = search_space,
terminator = terminator,
tuner=tuner)
at
#> <AutoTuner:classif.glmnet.tuned>
#> * Model: -
#> * Parameters: list()
#> * Packages: glmnet
#> * Predict Type: prob
#> * Feature types: logical, integer, numeric
#> * Properties: multiclass, twoclass, weights
Open Question
How can I demonstrate that my second approach is valid and that I get same or similar coefficients with different seeds? ie. how can I extract the coefficients for the final model of the AutoTuner
set.seed(23)
at$train(train.task) -> tune1
set.seed(2323)
at$train(train.task) -> tune2
Created on 2021-03-18 by the reprex package (v1.0.0)
Repeated hyperparameter tuning (alpha and lambda) of glmnet
can be done using the SECOND mlr3
approach as stated above.
The coefficients can be extracted with stats::coef
and the stored values in the AutoTuner
coef(tune1$model$learner$model, alpha=tune1$tuning_result$alpha,s=tune1$tuning_result$s)
# 9 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -1.6359082102
# age 0.0075541841
# glucose 0.0044351365
# insulin 0.0005821515
# mass 0.0077104934
# pedigree 0.0911233031
# pregnant 0.0164721202
# pressure 0.0007055435
# triceps 0.0056942014
coef(tune2$model$learner$model, alpha=tune2$tuning_result$alpha,s=tune2$tuning_result$s)
# 9 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -1.6359082102
# age 0.0075541841
# glucose 0.0044351365
# insulin 0.0005821515
# mass 0.0077104934
# pedigree 0.0911233031
# pregnant 0.0164721202
# pressure 0.0007055435
# triceps 0.0056942014
这篇关于如何在 mlr3 中重复 glmnet 的超参数调整(alpha 和/或 lambda)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!