基于混淆矩阵和Caret统计量的零-R模型计算灵敏度和特异度 [英] Zero-R model calculation of Sensitivity and Specificity using Confusion Matrix and Statistics with Caret
本文介绍了基于混淆矩阵和Caret统计量的零-R模型计算灵敏度和特异度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是我从R中的confusionMatrix()函数得到的结果,它基于Zero-R模型。我可能设置了错误的函数,根据它的结果,我手动获得的结果与confusionMatrix()函数的灵敏度答案1.0000之间存在不匹配,因为答案因随机种子而异:
> sensitivity1 = 213/(213+128)
> sensitivity2 = 211/(211+130)
> sensitivity3 = 215/(215+126)
> #specificity = 0/(0+0) there were no other predictions
> specificity = 0
> specificity
[1] 0
> sensitivity1
[1] 0.6246334
> sensitivity2
[1] 0.6187683
> sensitivity3
[1] 0.6304985
有一条警告消息,但它看起来仍在运行,并重新调整数据以进行匹配,因为它的顺序不同,这可能基于训练和测试排序和随机化。我试图返回并确保列车和测试没有使用负号或不同的行数进行反向排序。以下是插入符号的confusionMatrix()函数的结果:
> confusionMatrix(as.factor(testDiagnosisPred), as.factor(testDiagnosis), positive="B")
Confusion Matrix and Statistics
Reference
Prediction B M
B 211 130
M 0 0
Accuracy : 0.6188
95% CI : (0.5649, 0.6706)
No Information Rate : 0.6188
P-Value [Acc > NIR] : 0.524
Kappa : 0
Mcnemar's Test P-Value : <2e-16
Sensitivity : 1.0000
Specificity : 0.0000
Pos Pred Value : 0.6188
Neg Pred Value : NaN
Prevalence : 0.6188
Detection Rate : 0.6188
Detection Prevalence : 1.0000
Balanced Accuracy : 0.5000
'Positive' Class : B
Warning message:
In confusionMatrix.default(as.factor(testDiagnosisPred), as.factor(testDiagnosis), :
Levels are not in the same order for reference and data. Refactoring data to match.
TestDiagnosis Pred仅显示它猜测数据集中每个癌症测试的诊断结果为良性(B),这些结果因种子而异,因为每次实际的良性(B)和恶性(M)结果都是随机的。
testDiagnosisPred
B
341
> ## testDiagnosisPred
> ## B
> ## 228
>
> majorityClass # confusion matrix
B M
211 130
> ##
> ## B M
> ## 213 128
>
> # another seed's confusion matrix
> ## B M
> ## 211 130
下面是使用head()和str()函数时的一些数据:
> head(testDiagnosisPred)
[1] "B" "B" "B" "B" "B" "B"
> head(cancerdata.train$Diagnosis)
[1] "B" "B" "M" "M" "M" "B"
> head(testDiagnosis)
[1] "B" "B" "M" "M" "M" "B"
>
> str(testDiagnosisPred)
chr [1:341] "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" ...
> str(cancerdata.train$Diagnosis)
chr [1:341] "B" "B" "M" "M" "M" "B" "B" "B" "M" "M" "M" "B" "M" "M" "B" "M" "B" "B" "B" "M" "B" "B" "B" "B" ...
> str(testDiagnosis)
chr [1:341] "B" "B" "M" "M" "M" "B" "B" "B" "M" "M" "M" "B" "M" "M" "B" "M" "B" "B" "B" "M" "B" "B" "B" "B" ...
>
推荐答案
混淆矩阵以及特异度和敏感度的计算是由于水平误读混淆矩阵而不是垂直误读造成的,正确答案来自脱字符中的confusionMatrix()函数,另一种知道这是ZeroR模型的方法是,进一步研究它总是1.00敏感度和0.00特异度!这是因为ZeroR模型使用零规则和零属性,只给出了大多数预测。
> confusionMatrix(as.factor(testDiagnosisPred), as.factor(testDiagnosis), positive="B")
Confusion Matrix and Statistics
Reference
Prediction B M
B 211 130
M 0 0
Accuracy : 0.6188
Sensitivity : 1.0000
Specificity : 0.0000
当我进行这些手动的特异度和敏感度计算时,我在水平方向而不是垂直方向误读了混淆矩阵:
> sensitivity1 = 213/(213+128)
> sensitivity2 = 211/(211+130)
> sensitivity3 = 215/(215+126)
> #specificity = 0/(0+0) there were no other predictions
> specificity = 0
> specificity
[1] 0
> sensitivity1
[1] 0.6246334
> sensitivity2
[1] 0.6187683
> sensitivity3
[1] 0.6304985
这篇关于基于混淆矩阵和Caret统计量的零-R模型计算灵敏度和特异度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文