为灵敏度优化插入符号似乎仍然为 ROC 优化 [英] Optimising caret for sensitivity still seems to optimise for ROC
问题描述
我正在尝试使用 rpart
在插入符号中最大限度地提高模型选择的敏感性.为此,我尝试复制这里给出的方法(向下滚动到带有用户定义函数 FourStat 的示例)caret 的 github 页面
I'm trying to maximise sensitivity in my model selection in caret using rpart
. To this end, I tried to replicate the method given here (scroll down to the example with the user-defined function FourStat) caret's github page
# create own function so we can use "sensitivity" as our metric to maximise:
Sensitivity.fc <- function (data, lev = levels(data$obs), model = NULL) {
out <- c(twoClassSummary(data, lev = levels(data$obs), model = NULL))
c(out, Sensitivity = out["Sens"])
}
rpart_caret_fit <- train(outcome~pred1+pred2+pred3+pred4,
na.action = na.pass,
method = "rpart",
control=rpart.control(maxdepth = 6),
tuneLength = 20,
# maximise sensitivity
metric = "Sensitivity",
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
summaryFunction = Sensitivity.fc))
但是,当我使用
rpart_caret_fit
表示仍然使用ROC标准来选择最终模型:
it indicates that it still used the ROC criterion to select the final model:
CART
678282 samples
4 predictor
2 classes: 'yes', 'no'
No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 678282, 678282, 678282, 678282, 678282, 678282, ...
Resampling results across tuning parameters:
cp ROC Sens Spec Sensitivity.Sens
0.000001909738 0.7259486 0.4123547 0.8227382 0.4123547
0.000002864607 0.7259486 0.4123547 0.8227382 0.4123547
0.000005729214 0.7259489 0.4123622 0.8227353 0.4123622
0.000006684083 0.7258036 0.4123614 0.8227379 0.4123614
0.000007638953 0.7258031 0.4123576 0.8227398 0.4123576
0.000009548691 0.7258028 0.4123539 0.8227416 0.4123539
0.000010694534 0.7257553 0.4123589 0.8227332 0.4123589
0.000015277905 0.7257313 0.4123614 0.8227290 0.4123614
0.000032465548 0.7253456 0.4112838 0.8234272 0.4112838
0.000038194763 0.7252966 0.4112912 0.8234196 0.4112912
0.000076389525 0.7248774 0.4102792 0.8240339 0.4102792
0.000164237480 0.7244847 0.4093688 0.8246372 0.4093688
0.000194793290 0.7241532 0.4086596 0.8250930 0.4086596
0.000310650737 0.7237546 0.4087379 0.8250393 0.4087379
0.001625187154 0.7233805 0.4006570 0.8295729 0.4006570
0.001726403276 0.7233225 0.3983850 0.8308874 0.3983850
0.002173282000 0.7230906 0.3915758 0.8348320 0.3915758
0.002237258227 0.7230906 0.3915758 0.8348320 0.3915758
0.006140444689 0.7173854 0.4897494 0.7695558 0.4897494
0.055330843035 0.5730987 0.2710906 0.8545549 0.2710906
ROC was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.000005729214.
如何覆盖 ROC 选择方法?
How can I override the ROC selection method?
推荐答案
你把事情复杂化了.
两个类摘要已经包含敏感度作为输出.列名Sens".指定就足够了:
Two class summary already contains Sensitivity as output. The column name "Sens". It is enough to specify:
metric = "Sens"
到 train
和summaryFunction = twoClassSummary
到 trainControl
完整示例:
library(caret)
library(mlbench)
data(Sonar)
rpart_caret_fit <- train(Class~.,
data = Sonar,
method = "rpart",
tuneLength = 20,
metric = "Sens",
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
method = "cv",
number = 5,
summaryFunction = twoClassSummary))
rpart_caret_fit
CART
208 samples
60 predictor
2 classes: 'M', 'R'
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 167, 166, 166, 166, 167
Resampling results across tuning parameters:
cp ROC Sens Spec
0.0000000 0.7088298 0.7023715 0.7210526
0.0255019 0.7075400 0.7292490 0.6684211
0.0510038 0.7105388 0.7758893 0.6405263
0.0765057 0.6904202 0.7841897 0.6294737
0.1020076 0.7104681 0.8114625 0.6094737
0.1275095 0.7104681 0.8114625 0.6094737
0.1530114 0.7104681 0.8114625 0.6094737
0.1785133 0.7104681 0.8114625 0.6094737
0.2040152 0.7104681 0.8114625 0.6094737
0.2295171 0.7104681 0.8114625 0.6094737
0.2550190 0.7104681 0.8114625 0.6094737
0.2805209 0.7104681 0.8114625 0.6094737
0.3060228 0.7104681 0.8114625 0.6094737
0.3315247 0.7104681 0.8114625 0.6094737
0.3570266 0.7104681 0.8114625 0.6094737
0.3825285 0.7104681 0.8114625 0.6094737
0.4080304 0.7104681 0.8114625 0.6094737
0.4335323 0.7104681 0.8114625 0.6094737
0.4590342 0.6500135 0.8205534 0.4794737
0.4845361 0.6500135 0.8205534 0.4794737
Sens was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.4845361.
另外我不认为你可以指定这是不正确的- caret 使用 control = rpart.control(maxdepth = 6)
来插入train
....
向前传递任何参数.所以你几乎可以传递任何参数.
Additionally I do not think you can specify This is not correct - caret passes any parameters forward using control = rpart.control(maxdepth = 6)
to caret train
....
. So you can pass pretty much any argument.
如果您想编写自己的汇总函数,这里有一个关于Sens"的例子:
If you are looking to write you own summary functions here is an example on the "Sens":
Sensitivity.fc <- function (data, lev = NULL, model = NULL) { #every summary function takes these three arguments
obs <- data[, "obs"] #these are the real values - always in column name "obs" in data
cls <- levels(obs) #there are the levels - you can also pass this to lev argument
probs <- data[, cls[2]] #these are the probabilities for the 2nd class - useful only if prob = TRUE
class <- as.factor(ifelse(probs > 0.5, cls[2], cls[1])) #calculate the classes based on some probability treshold
Sensitivity <- caret::sensitivity(class, obs) #do the calculation - I was lazy so I used a built in function to do it for me
names(Sensitivity) <- "Sens" #the name of the output
Sensitivity
}
现在:
rpart_caret_fit <- train(Class~.,
data = Sonar,
method = "rpart",
tuneLength = 20,
metric = "Sens", #because of this line: names(Sensitivity) <- "Sens"
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
method = "cv",
number = 5,
summaryFunction = Sensitivity.fc))
让我们检查两者是否产生相同的结果:
Lets check if both produce the same results:
set.seed(1)
fit_sens <- train(Class~.,
data = Sonar,
method = "rpart",
tuneLength = 20,
metric = "Sens",
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
method = "cv",
number = 5,
summaryFunction = Sensitivity.fc))
set.seed(1)
fit_sens2 <- train(Class~.,
data = Sonar,
method = "rpart",
tuneLength = 20,
metric = "Sens",
maximize = TRUE,
trControl = trainControl(classProbs = TRUE,
method = "cv",
number = 5,
summaryFunction = twoClassSummary))
all.equal(fit_sens$results[c("cp", "Sens")],
fit_sens2$results[c("cp", "Sens")])
TRUE
all.equal(fit_sens$bestTune,
fit_sens2$bestTune)
TRUE
这篇关于为灵敏度优化插入符号似乎仍然为 ROC 优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!