在MLR中获取测试集的预测 [英] Get predictions on test sets in MLR

查看:181
本文介绍了在MLR中获取测试集的预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用R中的MLR包来拟合二进制问题的分类模型.对于每个模型,我都使用"selectFeatures"功能对带有嵌入式特征选择的交叉验证,并检索测试集上的平均AUC.接下来,我想针对每个折痕在测试集上检索预测,但是此功能似乎不支持该预测.我已经尝试将选定的预测变量插入重采样"函数中以进行获取.它可以工作,但是性能指标不同,这不适合我的分析.我还尝试检查插入符号包是否可行,但乍一看我还没有找到解决方案.知道怎么做吗?

I'm fitting classification models for binary issues using MLR package in R. For each model, I perform a cross-validation with embedded feature selection using "selectFeatures" function and retrieve mean AUCs over test sets. I would like next to retrieve predictions on the test sets for each fold but this function does not seem to support that. I already tried to plug selected predictors into the "resample" function to get it. It works, but performance metrics are different which is not suitable for my analysis. I also tried to check in caret package if it is possible but I have not seen a solution at first glance. Any idea how to do it?

这是我的代码,其中包含综合数据,并且尝试使用重新采样"功能(再次:由于性能指标不同,因此不适合当前版本).

Here is my code with synthetic data and with my attempt with "resample" function (again: not suitable in this current version as performance metrics are different) .

# 1. Find a synthetic dataset for supervised learning (two classes)
###################################################################

install.packages("mlbench")
library(mlbench)
data(BreastCancer)

# generate 1000 rows, 21 quantitative candidate predictors and 1 target variable 
p<-mlbench.waveform(1000) 

# convert list into dataframe
dataset<-as.data.frame(p)

# drop thrid class to get 2 classes
dataset2  = subset(dataset, classes != 3)

# 2. Perform cross validation with embedded feature selection
#############################################################

library(BBmisc)
library(nnet)
library(mlr)

# Choice of algorithm i.e. neural network
mL <- makeLearner("classif.nnet", predict.type = "prob")

# Choice of sampling plan: 10 fold cross validation with stratification of target classes 
mRD = makeResampleDesc("CV", iters = 10,stratify = TRUE)

# Choice of feature selection strategy   
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)

# Choice of feature selection technique (stepwize family) and p-value 
mFSCS = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)

# Choice of seed 
set.seed(12)

# Choice of data 
mCT <- makeClassifTask(data =dataset2, target = "classes")

# Perform the method
result = selectFeatures(mL,mCT, mRD, control = ctrl, measures = list(mlr::auc,mlr::acc,mlr::brier))

# Retrieve AUC and selected variables
analyzeFeatSelResult(result)
# Result: auc.test.mean=0.9614525 Variables selected: x.10, x.11, x.15, x.17, x.18    

# 3. Retrieve predictions on tests sets (to later perform Delong tests on AUCs derived from multiple sets of candidate variables)
#################################################################################################################################

# create new dataset with selected predictors
keep <- c("x.10","x.11","x.15","x.17","x.18","classes")
dataset3 <- dataset2[ , names(dataset2) %in% keep]

# Perform same tasks with  resample function instead of selectFeatures function to get predictions on tests set
mL <- makeLearner("classif.nnet", predict.type = "prob")   
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
mRD = makeResampleDesc("CV", iters = 10,stratify = TRUE)
set.seed(12)
mCT <- makeClassifTask(data =dataset3, target = "classes")
r1r = resample(mL, mCT, mRD, measures = list(mlr::auc,mlr::acc,mlr::brier))
# Result: auc.test.mean=0.9673023

您的代码中缺少

推荐答案

ctrl.

要获得对重采样对象的预测,只需使用getRRPredictions(r1r)r1r$measures.test.

For getting predictions of your resample object, just use getRRPredictions(r1r) or r1r$measures.test.

这篇关于在MLR中获取测试集的预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆