Caret如何通过K折交叉验证生成OLS模型? [英] How does Caret generate an OLS model with K-fold cross validation?

查看:98
本文介绍了Caret如何通过K折交叉验证生成OLS模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一些通用数据集,对于这些数据集,OLS回归是最佳选择.因此,我生成了一个带有一些一阶项的模型,并决定将R中的Caret用于我的回归系数估计和误差估计.

Let's say I have some generic dataset for which an OLS regression is the best choice. So, I generate a model with some first-order terms and decide to use Caret in R for my regression coefficient estimates and error estimates.

在插入符号中,最终结果是:

In caret, this ends up being:

k10_cv = trainControl(method="cv", number=10)
ols_model = train(Y ~ X1 + X2 + X3, data = my_data, trControl = k10_cv, method = "lm")

从那里,我可以使用summary(ols_model)提取回归信息,还可以通过调用ols_model提取更多信息.

From there, I can pull out regression information using summary(ols_model) and can also pull some more information by just calling ols_model.

当我只看ols_model时,是否通过典型的k倍CV方法计算出RMSE/R-square/MAE?另外,当生成我在summary(ols_model)中看到的模型时,该模型是在整个数据集中训练的,还是在每个折痕处生成的模型的平均值?

When I just look at ols_model, is the RMSE/R-square/MAE being calculated via the typical k-fold CV approach? Also, when the model I see in summary(ols_model) is generated, is this model trained on the entire dataset or is it an average of models generated across each of the folds?

如果没有,为了交易偏差的偏见,是否有办法在Caret中获取一次被训练一次的ems中的OLS模型?

If not, in the interest of trading variance for bias, is there a way to acquire an OLS model within Caret that is trained on one fold at a time?

推荐答案

以下是您示例的可复制数据.

Here's reproducible data for your example.

library("caret")
my_data <- iris

k10_cv <- trainControl(method="cv", number=10)

set.seed(100)
ols_model <- train(Sepal.Length ~  Sepal.Width + Petal.Length + Petal.Width,
                  data = my_data, trControl = k10_cv, method = "lm")


> ols_model$results
  intercept      RMSE  Rsquared       MAE     RMSESD RsquaredSD      MAESD
1      TRUE 0.3173942 0.8610242 0.2582343 0.03881222 0.04784331 0.02960042

1)上面的ols_model$results基于下面每个不同重采样的平均值:

1)The ols_model$results above is based on the mean of each of the different resampling below:

> (ols_model$resample)
        RMSE  Rsquared       MAE Resample
1  0.3386472 0.8954600 0.2503482   Fold01
2  0.3154519 0.8831588 0.2815940   Fold02
3  0.3167943 0.8904550 0.2441537   Fold03
4  0.2644717 0.9085548 0.2145686   Fold04
5  0.3769947 0.8269794 0.3070733   Fold05
6  0.3720051 0.7792611 0.2746565   Fold06
7  0.3258501 0.8095141 0.2647466   Fold07
8  0.2962375 0.8530810 0.2731445   Fold08
9  0.3059100 0.8351535 0.2611982   Fold09
10 0.2615792 0.9286246 0.2108592   Fold10

> mean(ols_model$resample$RMSE)==ols_model$results$RMSE
[1] TRUE

2)在整个训练集上训练模型.您可以使用lm进行检查或为trainControl指定method = "none".

2)The model is trained on the whole training set. You can check this with either using lm or specify method = "none" for the trainControl.

 coef(lm(Sepal.Length ~  Sepal.Width + Petal.Length + Petal.Width, data = my_data))
 (Intercept)  Sepal.Width Petal.Length  Petal.Width 
   1.8559975    0.6508372    0.7091320   -0.5564827 

ols_model$finalModel相同.

这篇关于Caret如何通过K折交叉验证生成OLS模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆