MLR:如何为顺序MBO参数化模型计算置换特征的重要性? [英] MLR: How to compute permuted feature importance for sequential MBO parametrized models?

查看：221 发布时间：2020/6/30 22:59:52 r mlr

本文介绍了MLR:如何为顺序MBO参数化模型计算置换特征的重要性?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用mlr和mlrMBO软件包进行嵌套交叉验证.内部CV用于参数化(例如，找到最佳参数).由于我想比较不同学习者的表现，因此我使用mlr的基准函数进行了基准实验.我的问题如下:是否可以对参数化模型/学习器进行置换?当我在基准实验中使用的学习者上调用 generateFeatureImportanceData 时，会再次估算模型(忽略序列优化学习的参数化).这是虹膜数据集上的一些代码，用于说明我的问题(无需预处理，仅用于说明).

    library(dplyr)
    library(mlr)
    library(mlrMBO)
    library(e1071)

    nr_inner_cv <- 3L
    nr_outer_cv <- 2L

    inner = makeResampleDesc(
      "CV"
      , iters = nr_inner_cv  # folds used in tuning/bayesian optimization)

    learner_knn_base = makeLearner(id = "knn", "classif.knn")

    par.set = makeParamSet(
      makeIntegerParam("k", lower = 2L, upper = 10L)
    )

    ctrl = makeMBOControl()
    ctrl <- makeMBOControl(propose.points = 1L)
    ctrl <- setMBOControlTermination(ctrl, iters = 10L)
    ctrl <- setMBOControlInfill(ctrl, crit = crit.ei, filter.proposed.points = TRUE)
    set.seed(500)
    tune.ctrl <- makeTuneControlMBO(
      mbo.control = ctrl,
      mbo.design = generateDesign(n = 10L, par.set = par.set)
    )

    learner_knn = makeTuneWrapper(learner = learner_knn_base
                                           , resampling = inner
                                           , par.set = par.set
                                           , control = tune.ctrl
                                           , show.info = TRUE
                                  )

    learner_nb <- makeLearner(
      id = "naiveBayes"
      ,"classif.naiveBayes"
    )

    lrns = list(
      learner_knn
      , learner_nb
    )

    rdesc = makeResampleDesc("CV", iters = nr_outer_cv)

    set.seed(12345)
    bmr = mlr::benchmark(lrns, tasks = iris.task, show.info = FALSE,
                         resamplings = rdesc, models = TRUE, keep.extract = TRUE)

解决方案

我认为这是一个我们经常遇到的普遍问题:我可以对CV中安装的模型进行XY吗?简短的回答:是的，您可以，但是您真的想要吗?

详细答案

类似的Q:

mlr:在CV循环中检索generateFilterValuesData的输出
I am doing nested cross-validation using the packages mlr and mlrMBO. The inner CV is used for parametrization (e.g. to find the optimal parameters). Since I want to compare the performance of different learners, I conduct a benchmark experiment using mlr's benchmark function. My question is the following: Is it possible to permute on the parametrized model/learner? When I call generateFeatureImportanceData on the learner I use in the benchmark experiment, the model is estimated again (ignoring the parametrization learned by sequenital optimization). Here is some code on the iris dataset to illustrate my question (no preprocessing and only for illustration).
```
    library(dplyr)
    library(mlr)
    library(mlrMBO)
    library(e1071)

    nr_inner_cv <- 3L
    nr_outer_cv <- 2L

    inner = makeResampleDesc(
      "CV"
      , iters = nr_inner_cv  # folds used in tuning/bayesian optimization)

    learner_knn_base = makeLearner(id = "knn", "classif.knn")

    par.set = makeParamSet(
      makeIntegerParam("k", lower = 2L, upper = 10L)
    )

    ctrl = makeMBOControl()
    ctrl <- makeMBOControl(propose.points = 1L)
    ctrl <- setMBOControlTermination(ctrl, iters = 10L)
    ctrl <- setMBOControlInfill(ctrl, crit = crit.ei, filter.proposed.points = TRUE)
    set.seed(500)
    tune.ctrl <- makeTuneControlMBO(
      mbo.control = ctrl,
      mbo.design = generateDesign(n = 10L, par.set = par.set)
    )

    learner_knn = makeTuneWrapper(learner = learner_knn_base
                                           , resampling = inner
                                           , par.set = par.set
                                           , control = tune.ctrl
                                           , show.info = TRUE
                                  )

    learner_nb <- makeLearner(
      id = "naiveBayes"
      ,"classif.naiveBayes"
    )

    lrns = list(
      learner_knn
      , learner_nb
    )

    rdesc = makeResampleDesc("CV", iters = nr_outer_cv)

    set.seed(12345)
    bmr = mlr::benchmark(lrns, tasks = iris.task, show.info = FALSE,
                         resamplings = rdesc, models = TRUE, keep.extract = TRUE)
```
解决方案
I think this is a general question that we get more often: Can I do XY on models fitted in the CV? Short answer: Yes you can, but do you really want that?

Detailed answer

Similar Q's:
- mlr: retrieve output of generateFilterValuesData within CV loop
- R - mlr: Is there a easy way to get the variable importance of tuned support vector machine models in nested resampling (spatial)?
As @jakob-r's comment indicates, there are two options:
1. Either you recreate the model outside the CV and call your desired function on it
2. You do it within the CV on each fitted model of the respective fold via the extract argument in resample(). See also Q2 linked above.
1) If you want to do this on all models, see 2) below. If you want to do it on the models of certain folds only: Which criteria did you use to select those?

2) is highly computational intensive and you might want to question why you want to do this - i.e. what do you want to do with all the information of each fold's model?

In general I've never seen a study/use case where has been applied. Everything you do in the CV contributes to estimating a performance value for each fold. You do not want to interact with these models afterwards.

You would rather want to estimate the feature importance once on the non-partitioned dataset (for which you have optimized the hyperpars beforehand once). This applies in the same way to other diagnostic methods of ML models: Apply them on your "full dataset", not for each model within the CV.

这篇关于MLR:如何为顺序MBO参数化模型计算置换特征的重要性?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

MLR:如何为顺序MBO参数化模型计算置换特征的重要性? [英] MLR: How to compute permuted feature importance for sequential MBO parametrized models?

问题描述

详细答案

Detailed answer

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

MLR:如何为顺序MBO参数化模型计算置换特征的重要性? [英] MLR: How to compute permuted feature importance for sequential MBO parametrized models?

问题描述

详细答案

Detailed answer

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭