插入符:结合使用createResample和groupKFold [英] caret: combine createResample and groupKFold

查看:107
本文介绍了插入符:结合使用createResample和groupKFold的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用插入符号进行自定义采样。我的规格如下:
我每天有1次观察,我的分组系数是月(12个值);因此,在第一步中,我将创建12个重采样,其中11个月的训练(11 * 30分)和1个测试(30分)组成。这样我总共可以获得12个重采样。

I want to do a custom sampling with caret. My specifications are the following: I have 1 observation per day, and my grouping factor is the month (12 values); so in the first step I create 12 resamples with 11 months in the training (11*30 points) and 1 in the testing (30 points). This way I get 12 resamples in total.

但这对我来说还不够,我想通过增加一些训练点的自举来使其更加复杂。每个分区。因此,与其在Resample01中获得11 * 30点,不如对这330点进行自举重采样。
最后,我想进行大量重采样,但是训练集中从来没有一个月。

But that's not enough to me and I would like to make it a little more complex, by adding some bootstrapping of the training points of each partition. So, instead of having 11*30 points in Resample01, I would have several bootstrapped resamples of these 330 points. So in the end, I want a lot of resamples, but with one of the months NEVER in the training set.

如何在通话中指定到火车
我尝试了什么:

How to specify this in a call to train? What I tried:

library(caret)
x = rep(1:12, each=30)
folds = groupKFold(x, k=12)
folds2 = lapply(folds, createResample, times=10)

但这是错误的,因为1 /我得到了一个嵌套列表,2 /初始索引在第二步丢失了。

but this is wrong because 1/ i get a nested list, 2/ the initial indices are lost at the second step.

感谢您的帮助(不要犹豫告诉我您是否认为这是XY pb)

Thanks for your help (and don't hesitate to tell me if you think it's a XY pb)

推荐答案

我相信这会解决您的问题

I trust this will solve your problem

library(caret)
x <- rep(1:12, each = 30)
folds <- groupKFold(x, k = 12)

在嵌套列表中为每个组的$ 折叠提供10个引导程序复制-这解决了丢失索引的问题。

provide 10 bootstrap replicates in a nested list for each of the groups in folds - this solves the lost indexes problem.

folds2 <- lapply(folds, function(x) lapply(1:10, function(i) sample(x, size = length(x), replace = TRUE)))

将嵌套列表转换为一维列表-这解决了嵌套列表问题。

convert nested list to a one dimensional list - this solves the nested list problem.

folds2 <- unlist(folds2 , recursive = FALSE, use.names = TRUE)

能正常工作吗?

df <- data.frame(y = rnorm(360), x = rnorm(360))

lm_formula <- train(
  y ~ ., df,
  method = "lm",
  trControl = trainControl(method = "boot" , index = folds2)
)

看起来确实如此。

唯一的问题可能是每次重新采样的预期 indexOut ,在示例中倍数用作测试。据我了解,您想测试保留的月份而不是所有保留的样本。要解决这个问题:

The only issue is perhaps in the intended indexOut for each resample, in the example all indexes not present in the fold were used as test. If I understood you would like to test on the held out months and not on all the held out samples. To solve this:

folds_out <- lapply(folds, function(x) setdiff(1:360, x))
folds_out <- rep(folds_out, each = 10)
names(folds_out) <- names(folds2)

lm_formula <- train(
  y ~ ., df,
  method = "lm",
  trControl = trainControl(method = "boot" , index = folds2, indexOut = folds_out)
)

这篇关于插入符:结合使用createResample和groupKFold的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆