R中使用插入符号进行分类的predict(model)和predict(model $ finalModel)之间的差异 [英] Difference between predict(model) and predict(model$finalModel) using caret for classification in R

查看:93
本文介绍了R中使用插入符号进行分类的predict(model)和predict(model $ finalModel)之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  predict(rf,newdata = testSet)
有什么区别

  predict(rf $ finalModel,newdata = testSet)

我使用 preProcess = c( center , scale)

  tc<-trainControl( repeatedcv,number = 10,重复次数= 10,classProbs = TRUE,savePred = T)
rf<-train(y〜。,data = trainingSet,method = rf,trControl = tc,preProc = c( center, scale ))

,当我在居中且缩放后的testSet上运行时,我会收到0个真实肯定值

  testSetCS<-testSet 
xTrans<-preProcess(testSetCS)
testSetCS< -predict(xTrans, testSet)
testSet $ Prediction<-预测(rf,newdata = testSet)
testSetCS $ Prediction<--predict(rf,newdata = testSetCS)

,但是当我在未缩放的testSet上运行时,会收到一些真实的肯定。
我必须使用rf $ finalModel在居中和缩放的testSet上接收一些真实的正数,在未缩放的testSet上接收rf对象……我缺少什么?






编辑



测试:

  tc<-trainControl( repeatedcv,number = 10,重复次数= 10,classProbs = TRUE,savePred = T)
RF<-train(Y〜。,data = trainingSet, method = rf,trControl = tc)#normal trainingData
RF.CS<-train(Y〜。,data = trainingSet,method = rf,trControl = tc,preProc = c( center , scale))#按比例缩放并居中训练数据在正常testSet上的数据

  RF预测合理(灵敏度= 0.33,特异性= 0.97)
RF $ finalModel预测不良(灵敏度= 0.74,特异性= 0.36)
RF.CS预测合理(灵敏度= 0.31,特异性= 0.97)
RF.CS $ finalModel与RF.CS的结果相同(灵敏度= 0.31,特异性= 0.97)

上居中并按比例缩放的testSetCS:

  RF预测非常糟糕(灵敏度= 0.00,特异性= 1.00)
RF $ finalModel预测合理(灵敏度= 0.33,特异性= 0.98)
RF.CS预测为RF(灵敏度= 0.00,特异性= 1.00)
RF.CS $ finalModel预测为RF(敏感性为0.00,特异性= 1.00)

因此,似乎$ finalModel需要相同格式的trainingSet和testSet,而受过训练的对象接受



预测代码(其中testSet是普通数据,而testSetCS是居中和缩放的),则仅选择非居中和未缩放的数据?



预测代码:

  testSet $ Prediction<-预测(RF,newdata = testSet)
testSet $ PredictionFM<--预测(RF $ finalModel,newdata = testSet)
testSet $ PredictionCS<-预测(RF.CS,newdata = testSet)
testSet $ PredictionCSFM<-预测(RF.CS $ finalModel,newdat a = testSet)

testSetCS $ Prediction<-预测(RF,newdata = testSetCS)
testSetCS $ PredictionFM<-预测(RF $ finalModel,newdata = testSetCS)
testSetCS $ PredictionCS<-预测(RF.CS,newdata = testSetCS)
testSetCS $ PredictionCSFM<-预测(RF.CS $ finalModel,newdata = testSetCS)


解决方案

坦白,



这与您的交叉验证



您确实需要



1)显示每个结果的准确预测代码



2)举一个可复制的示例。



使用正常的 testSet RF.CS RF.CS $ finalModel 应该不会为您提供相同的结果,我们应该能够复制该结果。另外,您的代码中存在语法错误,因此它可能与您执行的代码不完全相同。



最后,我不太确定为什么要使用 finalModel 对象。 train 的重点是处理细节并以这种方式处理(您可以选择这样做)来绕开通常会应用的完整代码集。



以下是可重现的示例:

  library(mlbench)
data(Sonar)

set.seed(1)
inTrain<-createDataPartition(Sonar $ Class)
培训<-Sonar [inTrain [[1] ],]
测试<-声纳[-inTrain [[1]],]

pp<-preProcess(training [,-ncol(Sonar)])
training2<-预测(pp,training [,-ncol(Sonar)])
training2 $ Class<-训练$ Class
testing2< -predict(pp,testing [,-ncol(Sonar)] )])
testing2 $ Class<-testing2 $ Class

tc<-trainControl( repeatedcv,
number = 10,
重复= 10,
classProbs = TRUE,
savePred = T)
set.seed(2)
RF<-train(Class〜。,data = training,
method = rf,
trControl = tc)
#正常训练数据
set.seed(2)
RF.CS<-train(Class〜。,data = training,
method = rf,
trControl = tc,
preProc = c( center, scale))
#缩放和居中训练数据

以下是一些结果:

 > ##这些不应该是相同的
> all.equal(predict(RF,testing,type = prob)[,1],
+预报(RF,testing2,type = prob)[,1])$ ​​b $ b [1] 平均相对差:0.4067554
>
> ##这些
> all.equal(predict(RF.CS,testing,type = prob)[,1],
+预报(RF.CS,testing2,type = prob)[,1])$ ​​b $ b [1]平均相对差:0.3924037
>
> all.equal(predict(RF.CS,testing,type = prob)[,1],
+预报(RF.CS $ finalModel,testing,type = prob)[,1])
[1]当前名称,而不是目标名称
[2]平均相对差:0.7452435
>
> ##这些应该是并且接近的(仅基于
> ##在最终RF拟合中使用的随机采样)
> all.equal(predict(RF,testing,type = prob)[,1],
+预报(RF.CS,测试,type = prob)[,1])$ ​​b $ b [ 1]平均相对差:0.04198887

Max


Whats the difference between

predict(rf, newdata=testSet)

and

predict(rf$finalModel, newdata=testSet) 

i train the model with preProcess=c("center", "scale")

tc <- trainControl("repeatedcv", number=10, repeats=10, classProbs=TRUE, savePred=T)
rf <- train(y~., data=trainingSet, method="rf", trControl=tc, preProc=c("center", "scale"))

and i receive 0 true positives when i run it on a centered and scaled testSet

testSetCS <- testSet
xTrans <- preProcess(testSetCS)
testSetCS<- predict(xTrans, testSet)
testSet$Prediction <- predict(rf, newdata=testSet)
testSetCS$Prediction <- predict(rf, newdata=testSetCS)

but receive some true positives when i run it on an unscaled testSet. I have to use the rf$finalModel to receive some true postives on the centered and scaled testSet and the rf object on the unscaled...what am i missing?


edit

tests:

tc <- trainControl("repeatedcv", number=10, repeats=10, classProbs=TRUE, savePred=T)
RF <-  train(Y~., data= trainingSet, method="rf", trControl=tc) #normal trainingData
RF.CS <- train(Y~., data= trainingSet, method="rf", trControl=tc, preProc=c("center", "scale")) #scaled and centered trainingData

on normal testSet:

RF predicts reasonable              (Sensitivity= 0.33, Specificity=0.97)
RF$finalModel predicts bad       (Sensitivity= 0.74, Specificity=0.36)
RF.CS predicts reasonable           (Sensitivity= 0.31, Specificity=0.97)
RF.CS$finalModel same results like RF.CS    (Sensitivity= 0.31, Specificity=0.97)

on centered and scaled testSetCS:

RF predicts very bad                (Sensitivity= 0.00, Specificity=1.00)
RF$finalModel predicts reasonable       (Sensitivity= 0.33, Specificity=0.98)
RF.CS predicts like RF              (Sensitivity= 0.00, Specificity=1.00)
RF.CS$finalModel predicts like RF       (Sensitivity= 0.00, Specificity=1.00)

so it seems as if the $finalModel needs the same format of trainingSet and testSet whereas the trained object accepts only uncentered and unscaled data, regardless of the selected preProcess parameter?

prediction code (where testSet is normal data and testSetCS is centered and scaled ):

testSet$Prediction <- predict(RF, newdata=testSet)
testSet$PredictionFM <- predict(RF$finalModel, newdata=testSet)
testSet$PredictionCS <- predict(RF.CS, newdata=testSet)
testSet$PredictionCSFM <- predict(RF.CS$finalModel, newdata=testSet)

testSetCS$Prediction <- predict(RF, newdata=testSetCS)
testSetCS$PredictionFM <- predict(RF$finalModel, newdata=testSetCS)
testSetCS$PredictionCS <- predict(RF.CS, newdata=testSetCS)
testSetCS$PredictionCSFM <- predict(RF.CS$finalModel, newdata=testSetCS)

解决方案

Frank,

This is really similar to your other question on Cross Validated.

You really need to

1) show your exact prediction code for each result

2) give us a reproducible example.

With the normal testSet, RF.CS and RF.CS$finalModel should not be giving you the same results and we should be able to reproduce that. Plus, there are syntax errors in your code so it can't be exactly what you executed.

Finally, I'm not really sure why you would use the finalModel object at all. The point of train is to handle the details and doing things this way (which is your option) circumvents the complete set of code that would normally be applied.

Here is a reproducible example:

 library(mlbench)
 data(Sonar)

 set.seed(1)
 inTrain <- createDataPartition(Sonar$Class)
 training <- Sonar[inTrain[[1]], ]
 testing <- Sonar[-inTrain[[1]], ]

 pp <- preProcess(training[,-ncol(Sonar)])
 training2 <- predict(pp, training[,-ncol(Sonar)])
 training2$Class <- training$Class
 testing2 <- predict(pp, testing[,-ncol(Sonar)])
 testing2$Class <- testing2$Class

 tc <- trainControl("repeatedcv", 
                    number=10, 
                    repeats=10, 
                    classProbs=TRUE, 
                    savePred=T)
 set.seed(2)
 RF <-  train(Class~., data= training, 
              method="rf", 
              trControl=tc)
 #normal trainingData
 set.seed(2)
 RF.CS <- train(Class~., data= training, 
                method="rf", 
                trControl=tc, 
                preProc=c("center", "scale")) 
 #scaled and centered trainingData

Here are some results:

 > ## These should not be the same
 > all.equal(predict(RF, testing,  type = "prob")[,1],
 +           predict(RF, testing2, type = "prob")[,1])
 [1] "Mean relative difference: 0.4067554"
 > 
 > ## Nor should these
 > all.equal(predict(RF.CS, testing,  type = "prob")[,1],
 +           predict(RF.CS, testing2, type = "prob")[,1])
 [1] "Mean relative difference: 0.3924037"
 > 
 > all.equal(predict(RF.CS,            testing, type = "prob")[,1],
 +           predict(RF.CS$finalModel, testing, type = "prob")[,1])
 [1] "names for current but not for target"
 [2] "Mean relative difference: 0.7452435" 
 >
 > ## These should be and are close (just based on the 
 > ## random sampling used in the final RF fits)
 > all.equal(predict(RF,    testing, type = "prob")[,1],
 +           predict(RF.CS, testing, type = "prob")[,1])
 [1] "Mean relative difference: 0.04198887"

Max

这篇关于R中使用插入符号进行分类的predict(model)和predict(model $ finalModel)之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆