在防风草模型上使用VIP包计算重要性量度 [英] Computing importance measure using VIP package on a parsnip model

查看:49
本文介绍了在防风草模型上使用VIP包计算重要性量度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在防风草制成的逻辑回归模型上使用vi_firm()计算特征重要性.对于正则表达式,我将使用虹膜数据集并尝试预测观察结果是否为setosa.

I am trying to compute feature importance using vi_firm() on a logistic regression model made in parsnip. For regex, I will use the iris dataset and try to predict whether an observation is setosa or not.

iris1 <- iris %>%
  mutate(class  = case_when(Species == 'setosa' ~ 'setosa',
                            TRUE ~ 'other'))
iris1$class = as.factor(iris1$class)

#set up logistic regression model
iris.lr = logistic_reg(
  mode="classification",
  penalty=NULL,
  mixture=NULL
) %>%
  set_engine("glmnet")

iris.fit = iris.lr %>%
  fit(class ~. , data = iris1)

library(vip)
vip::vi_firm(iris.fit, feature_names = features, train = iris1, type = 'classification')

这给

错误:您是要使用 new_data 而不是 newdata 吗?

我也在尝试使用相关pdp包中的partial来生成偏相关图.我遇到同样的错误.

I am also trying to produce partial dependence plots using partial from the related pdp package. I get the same error.

推荐答案

对于"glmnet"对象,为了保持一致性,正确的参数应为 s ,而不是 lambda 使用 coef.glmnet(但是,由于与 scale 参数部分匹配,使用 vi() 调用当前会产生错误---I'将在本周末推动修复; https://github.com/koalaverse/vip/issues/103).同样,从0.2.2版开始,vi_model应该直接与model_fit对象一起使用.因此,此处的正确呼叫应该是:

For "glmnet" objects, the correct argument should be s, rather than lambda, for consistency with coef.glmnet (however, calling this with vi() currently produces an error due to partial matching with the scale argument---I'll push a fix this weekend; https://github.com/koalaverse/vip/issues/103). Also, as of version 0.2.2, vi_model should work directly with model_fit objects. So the correct call here should be:

> vi_model(iris_fit, s = iris_fit$fit$lambda[10]). #
# A tibble: 4 x 3
  Variable     Importance Sign 
  <chr>             <dbl> <chr>
1 Sepal.Length      0     NEG  
2 Sepal.Width       0     NEG  
3 Petal.Length     -0.721 NEG  
4 Petal.Width       0     NEG 

vi_firm() pdp :: partial()而言,最简单的方法是创建自己的预测包装器.每种功能的文档中应该有很多细节,我们即将发表的论文中还有更多示例( https://github.com/koalaverse/vip/blob/master/rjournal/RJwrapper.pdf ),但这是一个基本示例:

As far as vi_firm() and pdp::partial() are concerned, the easiest thing to do is to create your own prediction wrapper. There should be plenty of details in the docs for each function, and this more examples in our upcoming paper (https://github.com/koalaverse/vip/blob/master/rjournal/RJwrapper.pdf), but here's a basic example:

> # Data matrix (features only)
> X <- data.matrix(subset(iris1, select = -class))
> 
> # Prediction wrapper for partial dependence
> pfun <- function(object, newdata) {
+   # Return averaged prediciton for class of interest
+   mean(predict(object, newx = newdata, s = iris_fit$fit$lambda[10], 
+        type = "link")[, 1L])
+ }
> 
> # PDP-based VI
> features <- setdiff(names(iris1), "class")
> vip::vi_firm(
+   object = iris_fit$fit, 
+   feature_names = features, 
+   train = X, 
+   pred.fun = pfun
+ )
# A tibble: 4 x 2
  Variable     Importance
  <chr>             <dbl>
1 Sepal.Length       0   
2 Sepal.Width        0   
3 Petal.Length       1.27
4 Petal.Width        0   
> 
> # PDP
> pd <- pdp::partial(iris_fit$fit, "Petal.Length", pred.fun = pfun, 
+                    train = X)
> head(pd)
  Petal.Length      yhat
1     1.000000 1.0644756
2     1.140476 0.9632228
3     1.280952 0.8619700
4     1.421429 0.7607172
5     1.561905 0.6594644
6     1.702381 0.5582116

这篇关于在防风草模型上使用VIP包计算重要性量度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆