在防风草模型上使用VIP包计算重要性量度 [英] Computing importance measure using VIP package on a parsnip model
问题描述
我正在尝试在防风草制成的逻辑回归模型上使用vi_firm()计算特征重要性.对于正则表达式,我将使用虹膜数据集并尝试预测观察结果是否为setosa.
I am trying to compute feature importance using vi_firm() on a logistic regression model made in parsnip. For regex, I will use the iris dataset and try to predict whether an observation is setosa or not.
iris1 <- iris %>%
mutate(class = case_when(Species == 'setosa' ~ 'setosa',
TRUE ~ 'other'))
iris1$class = as.factor(iris1$class)
#set up logistic regression model
iris.lr = logistic_reg(
mode="classification",
penalty=NULL,
mixture=NULL
) %>%
set_engine("glmnet")
iris.fit = iris.lr %>%
fit(class ~. , data = iris1)
library(vip)
vip::vi_firm(iris.fit, feature_names = features, train = iris1, type = 'classification')
这给
错误:您是要使用
new_data
而不是newdata
吗?
我也在尝试使用相关pdp包中的partial来生成偏相关图.我遇到同样的错误.
I am also trying to produce partial dependence plots using partial from the related pdp package. I get the same error.
推荐答案
对于"glmnet"对象,为了保持一致性,正确的参数应为 s
,而不是 lambda
使用 coef.glmnet
(但是,由于与 scale
参数部分匹配,使用 vi()
调用当前会产生错误---I'将在本周末推动修复; https://github.com/koalaverse/vip/issues/103).同样,从0.2.2版开始,vi_model应该直接与model_fit对象一起使用.因此,此处的正确呼叫应该是:
For "glmnet" objects, the correct argument should be s
, rather than lambda
, for consistency with coef.glmnet
(however, calling this with vi()
currently produces an error due to partial matching with the scale
argument---I'll push a fix this weekend; https://github.com/koalaverse/vip/issues/103). Also, as of version 0.2.2, vi_model should work directly with model_fit objects. So the correct call here should be:
> vi_model(iris_fit, s = iris_fit$fit$lambda[10]). #
# A tibble: 4 x 3
Variable Importance Sign
<chr> <dbl> <chr>
1 Sepal.Length 0 NEG
2 Sepal.Width 0 NEG
3 Petal.Length -0.721 NEG
4 Petal.Width 0 NEG
就 vi_firm()
和 pdp :: partial()
而言,最简单的方法是创建自己的预测包装器.每种功能的文档中应该有很多细节,我们即将发表的论文中还有更多示例( https://github.com/koalaverse/vip/blob/master/rjournal/RJwrapper.pdf ),但这是一个基本示例:
As far as vi_firm()
and pdp::partial()
are concerned, the easiest thing to do is to create your own prediction wrapper. There should be plenty of details in the docs for each function, and this more examples in our upcoming paper (https://github.com/koalaverse/vip/blob/master/rjournal/RJwrapper.pdf), but here's a basic example:
> # Data matrix (features only)
> X <- data.matrix(subset(iris1, select = -class))
>
> # Prediction wrapper for partial dependence
> pfun <- function(object, newdata) {
+ # Return averaged prediciton for class of interest
+ mean(predict(object, newx = newdata, s = iris_fit$fit$lambda[10],
+ type = "link")[, 1L])
+ }
>
> # PDP-based VI
> features <- setdiff(names(iris1), "class")
> vip::vi_firm(
+ object = iris_fit$fit,
+ feature_names = features,
+ train = X,
+ pred.fun = pfun
+ )
# A tibble: 4 x 2
Variable Importance
<chr> <dbl>
1 Sepal.Length 0
2 Sepal.Width 0
3 Petal.Length 1.27
4 Petal.Width 0
>
> # PDP
> pd <- pdp::partial(iris_fit$fit, "Petal.Length", pred.fun = pfun,
+ train = X)
> head(pd)
Petal.Length yhat
1 1.000000 1.0644756
2 1.140476 0.9632228
3 1.280952 0.8619700
4 1.421429 0.7607172
5 1.561905 0.6594644
6 1.702381 0.5582116
这篇关于在防风草模型上使用VIP包计算重要性量度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!