如何从mlr包中将阻塞因子包括在makeClassifTask()中? [英] How can a blocking factor be included in makeClassifTask() from mlr package?
问题描述
在某些分类任务中,使用mlr
包,我需要处理与此类似的data.frame
:
In some classification tasks, using mlr
package, I need to deal with a data.frame
similar to this one:
set.seed(pi)
# Dummy data frame
df <- data.frame(
# Repeated values ID
ID = sort(sample(c(0:20), 100, replace = TRUE)),
# Some variables
X1 = runif(10, 1, 10),
# Some Label
Label = sample(c(0,1), 100, replace = TRUE)
)
df
我需要对模型进行交叉验证,并使用相同的ID
值,我从教程中知道:
I need to cross-validate the model keeping together the values with the same ID
, I know from the tutorial that:
https://mlr -org.github.io/mlr-tutorial/release/html/task/index.html#further-settings
我们可以在任务中包括一个阻碍因素.这表明某些观察值属于"并且在将数据分为训练和测试集以进行重采样时不应分开.
We could include a blocking factor in the task. This would indicate that some observations "belong together" and should not be separated when splitting the data into training and test sets for resampling.
问题是我如何在makeClassifTask
中包括该阻止因素?
The question is how can I include this blocking factor in the makeClassifTask
?
不幸的是,我找不到任何示例.
Unfortunately, I couldn't find any example.
推荐答案
您具有哪个版本的mlr?一段时间以来,阻塞应该是其中的一部分.您可以直接在makeClassifTask
What version of mlr do you have? Blocking should be part of it since a while. You can find it directly as an argument in makeClassifTask
以下是您的数据示例:
df$ID = as.factor(df$ID)
df2 = df
df2$ID = NULL
df2$Label = as.factor(df$Label)
tsk = makeClassifTask(data = df2, target = "Label", blocking = df$ID)
res = resample("classif.rpart", tsk, resampling = cv10)
# to prove-check that blocking worked
lapply(1:10, function(i) {
blocks.training = df$ID[res$pred$instance$train.inds[[i]]]
blocks.testing = df$ID[res$pred$instance$test.inds[[i]]]
intersect(blocks.testing, blocks.training)
})
#all entries are empty, blocking indeed works!
这篇关于如何从mlr包中将阻塞因子包括在makeClassifTask()中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!