R中使用插入符号进行分类的predict(model)和predict(model $ finalModel)之间的差异 [英] Difference between predict(model) and predict(model$finalModel) using caret for classification in R
问题描述
predict(rf,newdata = testSet)
有什么区别
和
predict(rf $ finalModel,newdata = testSet)
我使用 preProcess = c( center , scale)
tc<-trainControl( repeatedcv,number = 10,重复次数= 10,classProbs = TRUE,savePred = T)
rf<-train(y〜。,data = trainingSet,method = rf,trControl = tc,preProc = c( center, scale ))
,当我在居中且缩放后的testSet上运行时,我会收到0个真实肯定值
testSetCS<-testSet
xTrans<-preProcess(testSetCS)
testSetCS< -predict(xTrans, testSet)
testSet $ Prediction<-预测(rf,newdata = testSet)
testSetCS $ Prediction<--predict(rf,newdata = testSetCS)
,但是当我在未缩放的testSet上运行时,会收到一些真实的肯定。
我必须使用rf $ finalModel在居中和缩放的testSet上接收一些真实的正数,在未缩放的testSet上接收rf对象……我缺少什么?
编辑
测试:
tc<-trainControl( repeatedcv,number = 10,重复次数= 10,classProbs = TRUE,savePred = T)
RF<-train(Y〜。,data = trainingSet, method = rf,trControl = tc)#normal trainingData
RF.CS<-train(Y〜。,data = trainingSet,method = rf,trControl = tc,preProc = c( center , scale))#按比例缩放并居中训练数据在正常testSet上的数据
:
RF预测合理(灵敏度= 0.33,特异性= 0.97)
RF $ finalModel预测不良(灵敏度= 0.74,特异性= 0.36)
RF.CS预测合理(灵敏度= 0.31,特异性= 0.97)
RF.CS $ finalModel与RF.CS的结果相同(灵敏度= 0.31,特异性= 0.97)
上居中并按比例缩放的testSetCS:
RF预测非常糟糕(灵敏度= 0.00,特异性= 1.00)
RF $ finalModel预测合理(灵敏度= 0.33,特异性= 0.98)
RF.CS预测为RF(灵敏度= 0.00,特异性= 1.00)
RF.CS $ finalModel预测为RF(敏感性为0.00,特异性= 1.00)
因此,似乎$ finalModel需要相同格式的trainingSet和testSet,而受过训练的对象接受
预测代码(其中testSet是普通数据,而testSetCS是居中和缩放的),则仅选择非居中和未缩放的数据?
预测代码:
testSet $ Prediction<-预测(RF,newdata = testSet)
testSet $ PredictionFM<--预测(RF $ finalModel,newdata = testSet)
testSet $ PredictionCS<-预测(RF.CS,newdata = testSet)
testSet $ PredictionCSFM<-预测(RF.CS $ finalModel,newdat a = testSet)
testSetCS $ Prediction<-预测(RF,newdata = testSetCS)
testSetCS $ PredictionFM<-预测(RF $ finalModel,newdata = testSetCS)
testSetCS $ PredictionCS<-预测(RF.CS,newdata = testSetCS)
testSetCS $ PredictionCSFM<-预测(RF.CS $ finalModel,newdata = testSetCS)
坦白,
这与您的交叉验证。
您确实需要
1)显示每个结果的准确预测代码
2)举一个可复制的示例。
使用正常的 testSet
, RF.CS
和 RF.CS $ finalModel
应该不会为您提供相同的结果,我们应该能够复制该结果。另外,您的代码中存在语法错误,因此它可能与您执行的代码不完全相同。
最后,我不太确定为什么要使用 finalModel
对象。 train
的重点是处理细节并以这种方式处理(您可以选择这样做)来绕开通常会应用的完整代码集。
以下是可重现的示例:
library(mlbench)
data(Sonar)
set.seed(1)
inTrain<-createDataPartition(Sonar $ Class)
培训<-Sonar [inTrain [[1] ],]
测试<-声纳[-inTrain [[1]],]
pp<-preProcess(training [,-ncol(Sonar)])
training2<-预测(pp,training [,-ncol(Sonar)])
training2 $ Class<-训练$ Class
testing2< -predict(pp,testing [,-ncol(Sonar)] )])
testing2 $ Class<-testing2 $ Class
tc<-trainControl( repeatedcv,
number = 10,
重复= 10,
classProbs = TRUE,
savePred = T)
set.seed(2)
RF<-train(Class〜。,data = training,
method = rf,
trControl = tc)
#正常训练数据
set.seed(2)
RF.CS<-train(Class〜。,data = training,
method = rf,
trControl = tc,
preProc = c( center, scale))
#缩放和居中训练数据
以下是一些结果:
> ##这些不应该是相同的
> all.equal(predict(RF,testing,type = prob)[,1],
+预报(RF,testing2,type = prob)[,1])$ b $ b [1] 平均相对差:0.4067554
>
> ##这些
> all.equal(predict(RF.CS,testing,type = prob)[,1],
+预报(RF.CS,testing2,type = prob)[,1])$ b $ b [1]平均相对差:0.3924037
>
> all.equal(predict(RF.CS,testing,type = prob)[,1],
+预报(RF.CS $ finalModel,testing,type = prob)[,1])
[1]当前名称,而不是目标名称
[2]平均相对差:0.7452435
>
> ##这些应该是并且接近的(仅基于
> ##在最终RF拟合中使用的随机采样)
> all.equal(predict(RF,testing,type = prob)[,1],
+预报(RF.CS,测试,type = prob)[,1])$ b $ b [ 1]平均相对差:0.04198887
Max
Whats the difference between
predict(rf, newdata=testSet)
and
predict(rf$finalModel, newdata=testSet)
i train the model with preProcess=c("center", "scale")
tc <- trainControl("repeatedcv", number=10, repeats=10, classProbs=TRUE, savePred=T)
rf <- train(y~., data=trainingSet, method="rf", trControl=tc, preProc=c("center", "scale"))
and i receive 0 true positives when i run it on a centered and scaled testSet
testSetCS <- testSet
xTrans <- preProcess(testSetCS)
testSetCS<- predict(xTrans, testSet)
testSet$Prediction <- predict(rf, newdata=testSet)
testSetCS$Prediction <- predict(rf, newdata=testSetCS)
but receive some true positives when i run it on an unscaled testSet. I have to use the rf$finalModel to receive some true postives on the centered and scaled testSet and the rf object on the unscaled...what am i missing?
edit
tests:
tc <- trainControl("repeatedcv", number=10, repeats=10, classProbs=TRUE, savePred=T)
RF <- train(Y~., data= trainingSet, method="rf", trControl=tc) #normal trainingData
RF.CS <- train(Y~., data= trainingSet, method="rf", trControl=tc, preProc=c("center", "scale")) #scaled and centered trainingData
on normal testSet:
RF predicts reasonable (Sensitivity= 0.33, Specificity=0.97)
RF$finalModel predicts bad (Sensitivity= 0.74, Specificity=0.36)
RF.CS predicts reasonable (Sensitivity= 0.31, Specificity=0.97)
RF.CS$finalModel same results like RF.CS (Sensitivity= 0.31, Specificity=0.97)
on centered and scaled testSetCS:
RF predicts very bad (Sensitivity= 0.00, Specificity=1.00)
RF$finalModel predicts reasonable (Sensitivity= 0.33, Specificity=0.98)
RF.CS predicts like RF (Sensitivity= 0.00, Specificity=1.00)
RF.CS$finalModel predicts like RF (Sensitivity= 0.00, Specificity=1.00)
so it seems as if the $finalModel needs the same format of trainingSet and testSet whereas the trained object accepts only uncentered and unscaled data, regardless of the selected preProcess parameter?
prediction code (where testSet is normal data and testSetCS is centered and scaled ):
testSet$Prediction <- predict(RF, newdata=testSet)
testSet$PredictionFM <- predict(RF$finalModel, newdata=testSet)
testSet$PredictionCS <- predict(RF.CS, newdata=testSet)
testSet$PredictionCSFM <- predict(RF.CS$finalModel, newdata=testSet)
testSetCS$Prediction <- predict(RF, newdata=testSetCS)
testSetCS$PredictionFM <- predict(RF$finalModel, newdata=testSetCS)
testSetCS$PredictionCS <- predict(RF.CS, newdata=testSetCS)
testSetCS$PredictionCSFM <- predict(RF.CS$finalModel, newdata=testSetCS)
Frank,
This is really similar to your other question on Cross Validated.
You really need to
1) show your exact prediction code for each result
2) give us a reproducible example.
With the normal testSet
, RF.CS
and RF.CS$finalModel
should not be giving you the same results and we should be able to reproduce that. Plus, there are syntax errors in your code so it can't be exactly what you executed.
Finally, I'm not really sure why you would use the finalModel
object at all. The point of train
is to handle the details and doing things this way (which is your option) circumvents the complete set of code that would normally be applied.
Here is a reproducible example:
library(mlbench)
data(Sonar)
set.seed(1)
inTrain <- createDataPartition(Sonar$Class)
training <- Sonar[inTrain[[1]], ]
testing <- Sonar[-inTrain[[1]], ]
pp <- preProcess(training[,-ncol(Sonar)])
training2 <- predict(pp, training[,-ncol(Sonar)])
training2$Class <- training$Class
testing2 <- predict(pp, testing[,-ncol(Sonar)])
testing2$Class <- testing2$Class
tc <- trainControl("repeatedcv",
number=10,
repeats=10,
classProbs=TRUE,
savePred=T)
set.seed(2)
RF <- train(Class~., data= training,
method="rf",
trControl=tc)
#normal trainingData
set.seed(2)
RF.CS <- train(Class~., data= training,
method="rf",
trControl=tc,
preProc=c("center", "scale"))
#scaled and centered trainingData
Here are some results:
> ## These should not be the same
> all.equal(predict(RF, testing, type = "prob")[,1],
+ predict(RF, testing2, type = "prob")[,1])
[1] "Mean relative difference: 0.4067554"
>
> ## Nor should these
> all.equal(predict(RF.CS, testing, type = "prob")[,1],
+ predict(RF.CS, testing2, type = "prob")[,1])
[1] "Mean relative difference: 0.3924037"
>
> all.equal(predict(RF.CS, testing, type = "prob")[,1],
+ predict(RF.CS$finalModel, testing, type = "prob")[,1])
[1] "names for current but not for target"
[2] "Mean relative difference: 0.7452435"
>
> ## These should be and are close (just based on the
> ## random sampling used in the final RF fits)
> all.equal(predict(RF, testing, type = "prob")[,1],
+ predict(RF.CS, testing, type = "prob")[,1])
[1] "Mean relative difference: 0.04198887"
Max
这篇关于R中使用插入符号进行分类的predict(model)和predict(model $ finalModel)之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!