在 caret train() 中指定结果变量的正类 [英] Specifying positive class of an outcome variable in caret train()

查看:63
本文介绍了在 caret train() 中指定结果变量的正类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有一种方法可以在插入符号的 train() 函数中指定哪一类结果变量是正的.一个最小的例子:

I'm wondering if there is a way to specify which class of the outcome variable is positive in caret's train() function. A minimal example:

# Settings
ctrl <- trainControl(method = "repeatedcv", number = 10, savePredictions = TRUE, summaryFunction = twoClassSummary, classProbs = TRUE)

# Data
data <- mtcars %>% mutate(am = factor(am, levels = c(0,1), labels = c("automatic", "manual"), ordered = T))

# Train
set.seed(123)
model1 <- train(am ~ disp + wt, data = data, method = "glm", family = "binomial", trControl = ctrl, tuneLength = 5)

# Data (factor ordering switched)
data <- mtcars %>% mutate(am = factor(am, levels = c(1,0), labels = c("manual", "automatic"), ordered = T))

# Train
set.seed(123)
model2 <- train(am ~ disp + wt, data = data, method = "glm", family = "binomial", trControl = ctrl, tuneLength = 5)

# Specifity and Sensitivity is switched
model1
model2

如果您运行代码,您会注意到特异性和灵敏度指标在两个模型中都切换"了.看起来 train() 函数将因子结果变量的第一级作为正结果.有没有办法在函数本身中指定一个正类,这样无论结果因子排序如何,我都会得到相同的结果?我尝试添加 positive = "manual" 但这会导致错误.

If you run the code, you'll notice that Specificity and Sensitivity metrics are "switched" in both models. It looks like the train() function takes the first level of a factor outcome variable as a positive outcome. Is there a way to specify a positive class in the function itself so I will get the same results no matter of the outcome factor ordering? I tried adding positive = "manual" but this results in an error.

推荐答案

问题不在于函数 train() 而在于函数 twoClassSummary ,它看起来像这样:

The issue lies not in function train() but in function twoClassSummary, which looks like this:

function (data, lev = NULL, model = NULL) 
{
  lvls <- levels(data$obs)

  [...]    

  out <- c(rocAUC, 
           sensitivity(data[, "pred"], data[, "obs"], 
             lev[1]),  # Hard coded positive class
           specificity(data[, "pred"], data[, "obs"], 
             lev[2])) # Hard coded negative class
  names(out) <- c("ROC", "Sens", "Spec")
  out
}

将它们传递给 sensivity()specificity() 的级别顺序在这里是硬编码的.

The order of the levels in which they are passed to sensitivity() and specificity() is hard-coded here.

正如@Seymour 非常正确地指出的那样,颠倒结果变量的级别顺序可以解决问题.

As @Seymour points out very correctly, reversing the order of the levels of the outcome variable fixes the issue.

df$target <- factor(df$target, levels=rev(levels(df$target)))

如果您不愿意更改级别的顺序,可以使用一种非侵入式的方法来更改 twoClassSummary() 函数.

If you are not willing to change the order of levels, there's an unintrusive way to change the twoClassSummary() function.

sensivity()specificity() 分别取 positivenegative 级别名称,(a次优设计选择).所以我们将这两个参数包含在我们的自定义函数中.再往下,我们将这些参数传递给相应的函数以解决问题.

sensitivity() and specificity() take the positive and negative level name, respectively, (a suboptimal design choice). So we include these two arguments into our custom function. Further down, we pass these arguments to the respective function to fix the problem.

customTwoClassSummary <- function(data, lev = NULL, model = NULL, positive = NULL, negative=NULL) 
{
  lvls <- levels(data$obs)
  if (length(lvls) > 2) 
    stop(paste("Your outcome has", length(lvls), "levels. The twoClassSummary() function isn't appropriate."))
  caret:::requireNamespaceQuietStop("ModelMetrics")
  if (!all(levels(data[, "pred"]) == lvls)) 
    stop("levels of observed and predicted data do not match")
  rocAUC <- ModelMetrics::auc(ifelse(data$obs == lev[2], 0, 
                                     1), data[, lvls[1]])
  out <- c(rocAUC, 
           # Only change happens here!
           sensitivity(data[, "pred"], data[, "obs"], positive=positive), 
           specificity(data[, "pred"], data[, "obs"], negative=negative))
  names(out) <- c("ROC", "Sens", "Spec")
  out
}

但是如何在不更改包中的更多代码的情况下指定这些选项?默认情况下 caret 不会将选项传递给汇总函数.我们在调用 trainControl() 时将函数封装在一个匿名函数中:

But how to specify these options without changing more code within the package? By default caret doesn't pass options to the summary function. We wrap the function up in an anonymous function in the call to trainControl():

ctrl <- trainControl(method = "repeatedcv", number = 10, savePredictions = TRUE, 
                     # This is a trick how to fix arguments for a function call
                     summaryFunction = function(...) customTwoClassSummary(..., 
                                       positive = "manual", negative="automatic"), 
                     classProbs = TRUE)

... 参数确保 caret 传递给匿名函数的所有其他参数都传递给 customTwoClassSummary().

The ... argument makes sure that all other arguments that caret passes to the anonymous function get passed on to customTwoClassSummary().

这篇关于在 caret train() 中指定结果变量的正类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆