比较插入符号模型对象的最佳函数 [英] Best function to compare caret model objects

查看:42
本文介绍了比较插入符号模型对象的最佳函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有许多使用相同数据和调整参数的插入符号模型对象.对于健全性检查,我想看看每个方法是否给我相同的模型对象.(这是运行并行处理并确保我的模型相同的更广泛计划的一部分.)

I have a number of caret model objects using the same data and tuning parameters. For a sanity check I want to see if each method gives me the same model object. (This is all part of a broader plan to run parallel processing and ensure my models are the same.)

例如,下面,我训练了 2 个不同的模型并想进行比较.

For example, below, I train 2 different models and want to compare.

当我比较插入符号对象时,它返回 FALSE.

When I compare the caret objects it returns FALSE.

> library(caret)
> 
> set.seed(0)
> myControl <- trainControl(method='cv', index=createFolds(iris$Species))
> 
> set.seed(0)
> model1 <- train(Species~., iris, method='rf', trControl=myControl)
> 
> set.seed(0)
> model2 <- train(Species~., iris, method='rf', trControl=myControl)
> 
> identical(model1,model2)
[1] FALSE
> all.equal(model1,model2)
[1] "Component "times": Component "everything": Mean relative difference: 0.09036145"
[2] "Component "times": Component "final": Mean relative difference: 0.75"           
> compare_models(model1, model2)

    One Sample t-test

data:  x
t = NaN, df = 9, p-value = NA
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 NaN NaN
sample estimates:
mean of x 
        0 

如果我比较最终模型而不是插入符号对象,它返回 TRUE.

If I compare the final model instead of the caret object, it returns TRUE.

> identical(model1$finalModel,model2$finalModel)
[1] TRUE
> all.equal(model1$finalModel,model2$finalModel)
[1] TRUE

所以我试图确定为什么插入符号对象不同?或者如果我使用了错误的功能?

So I am trying to determine why the caret objects are different? Or if I am using the wrong function?

我还设置了种子(如本例中所示:https://stackoverflow.com/a/21988897/8799325) 并且仍然有同样的问题.

I have also set the seeds (like in this example: https://stackoverflow.com/a/21988897/8799325) and still have the same issue.

更新:当我交换不同的模型(例如 rpart、lm)然后使用 finalModel 规范时,我得到了对 same() 调用的 FALSE 和对 all.equal() 的 TRUE.不同型号的使用一定有什么东西吗?

UPDATE: When I interchange different models (e.g. rpart, lm) then with the finalModel specification I get FALSE for the identical() call and TRUE for all.equal(). There must be something in the use of different models?

> set.seed(0)
> myControl <- trainControl(method='cv', index=createFolds(iris$Species))
> 
> set.seed(0)
> model3 <- train(Species~., iris, method='rpart', trControl=myControl)
> 
> set.seed(0)
> model4 <- train(Species~., iris, method='rpart', trControl=myControl)
> 
> identical(model3,model4)
[1] FALSE
> all.equal(model3,model4)
[1] "Component "times": Component "everything": Mean relative difference: 0.05063291"
[2] "Component "times": Component "final": Mean relative difference: 1"              
> compare_models(model3, model4)

    One Sample t-test

data:  x
t = NaN, df = 9, p-value = NA
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 NaN NaN
sample estimates:
mean of x 
        0 

> 
> identical(model3$finalModel,model4$finalModel)
[1] FALSE
> all.equal(model3$finalModel,model4$finalModel)
[1] TRUE

推荐答案

train() 存储运行函数所花费的执行时间,参见 model1$times?train.我认为这些时间与您的目的无关,因此您可以放心地忽略它们:

train() stores the execution time it took to run the function, see model1$times and ?train. I think these times are irrelevant for your purpose, so that you can safely ignore them:

all.equal(model1[!names(model1) %in% "times"], model2[!names(model2) %in% "times"])

这篇关于比较插入符号模型对象的最佳函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆