R中glmnet模型变量重要性计算的差异 [英] Discripencies in variable importance calculation for glmnet model in R
问题描述
我想计算 R 中 glmnet 模型的变量重要性.我使用 glmnet
包来拟合弹性网络模型,如
I want to calculate variable importance for glmnet model in R. I am using glmnet
package for fitting the elastic net model like
library(glmnet)
library(caret)
library(vip)
data_y <- as.vector(mtcars$mpg)
data_x <- as.matrix(mtcars[-1])
fit.glmnet <- glmnet(data_x, data_y, family="gaussian")
set.seed(123)
cvfit.glmnet = cv.glmnet(data_x, data_y, standardize=T)
cvfit.glmnet$lambda.min
coef(cvfit.glmnet, s = "lambda.min")
然后我使用 vip
包作为变量重要性
Then I have used vip
package for variable importance as
#Using vip package
vip::vi_model(cvfit.glmnet, s = cvfit.glmnet$fit$lambda)
返回我
># A tibble: 10 x 3
Variable Importance Sign
<chr> <dbl> <chr>
1 cyl -0.886 NEG
2 disp 0 NEG
3 hp -0.0117 NEG
4 drat 0 NEG
5 wt -2.71 NEG
6 qsec 0 NEG
7 vs 0 NEG
8 am 0 NEG
9 gear 0 NEG
10 carb 0 NEG
变量重要性包含变量的正值和负值,同时它不会在 0-1 或 0-100% 之间变化.
The variable importance contains both positive and negative values for the variables at the same time it does not vary between 0-1 or 0-100%.
然后我尝试了这个answer
#Using function provided in this example
varImp <- function(object, lambda = NULL, ...) {
## skipping a few lines
beta <- predict(object, s = lambda, type = "coef")
if(is.list(beta)) {
out <- do.call("cbind", lapply(beta, function(x) x[,1]))
out <- as.data.frame(out)
} else out <- data.frame(Overall = beta[,1])
out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
out
}
varImp(cvfit.glmnet, lambda = cvfit.glmnet$lambda.min)
它在输出后返回我
Overall
cyl 0.88608541
disp 0.00000000
hp 0.01168438
drat 0.00000000
wt 2.70814703
qsec 0.00000000
vs 0.00000000
am 0.00000000
gear 0.00000000
carb 0.00000000
虽然自定义函数的输出不包含负值,但确实在 0-1 或 0-100% 之间变化.
Though the output from customised function does not contain negative values, it does vary within 0-1 or 0-100%.
我知道 caret
包有 varImp
函数,它给出了 0-100% 之间的变量重要性.但我想为 cv.glmnet
对象而不是 caret::train
对象实现同样的事情.如何为 cv.glmnet
对象实现类似 caret
包的变量重要性?
I know that caret
package has varImp
function which gives variable importance between 0-100%. But I want to implement the same thing for cv.glmnet
object instead of caret::train
object. How can I achieve the variable importance alike caret
package for cv.glmnet
object?
推荐答案
问题询问如何获取 0-100% 之间的 glmnet 变量重要性.
The question asks how to obtain glmnet variable importance between 0-100%.
如果希望在某个(通常是最优的)惩罚下根据系数大小分配重要性.如果这些系数是基于标准化变量(glmnet 中的默认值)导出的,那么这些系数可以简单地缩放到 0 - 1 范围:
If it is desired to assign importance based on coefficient magnitude at a certain (usually optimal) penalty. And if these coefficients are derived based on standardized variables (default in glmnet) then the coefficients can simply be scaled to the 0 - 1 range:
给出稍微修改的函数:
varImp <- function(object, lambda = NULL, ...) {
beta <- predict(object, s = lambda, type = "coef")
if(is.list(beta)) {
out <- do.call("cbind", lapply(beta, function(x) x[,1]))
out <- as.data.frame(out)
} else out <- data.frame(Overall = beta[,1])
out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
out <- out/max(out)
out[order(out$Overall, decreasing = TRUE),,drop=FALSE]
}
使用问题中的示例:
varImp(cvfit.glmnet, lambda = cvfit.glmnet$lambda.min)
#output
Overall
wt 1.000000000
cyl 0.320796270
am 0.004840186
hp 0.004605913
disp 0.000000000
drat 0.000000000
qsec 0.000000000
vs 0.000000000
gear 0.000000000
carb 0.000000000
为 glmnet 模型分配变量重要性的另一种方法是根据包含的惩罚对变量进行评分 - 如果在较高的惩罚中被排除,则变量更重要.这种方法将在 mlr3 包中实现:https://github.com/mlr-org/mlr3learners/issues/28 在某些时候
Another approach at assigning variable importance to glmnet models would be scoring the variables based on the penalty for inclusion - Variables are more significant if the are excluded at higher penalties. This approach will be implemented in the mlr3 package: https://github.com/mlr-org/mlr3learners/issues/28 at some point
这篇关于R中glmnet模型变量重要性计算的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!