使用Caret在R中为k折CV创建折痕 [英] Creating folds for k-fold CV in R using Caret
问题描述
我正在尝试使用
此集合由208行组成,每行具有60个属性。我正在使用read.table函数将其读取到data.frame中。
下一步是将数据拆分为k折,假设k = 5。我的首次尝试是使用
test<-createFolds(t,k = 5)
我有两个问题。第一个是折线的长度不是彼此相邻的:
长度等级模式
Fold1 29-无数值
Fold2 14-无数值
Fold3 7-无数字
Fold4 5-无数字
Fold5 5-无数字
另一个是,这显然是根据属性索引分割了我的数据,但是我想分割数据本身。我以为通过转置data.frame,使用:
test<-t(myDataNumericValues)
但是当我调用createFolds函数时,它给了我这样的东西:
长度类模式
折叠1 2496-无数字
Fold2 2496-无-数字
Fold3 2495-无-数字
Fold4 2496-无-数字
Fold5 2497-无数值
长度问题已解决,但仍未相应拆分我的208个数据。
关于我能做什么的任何想法?您是否认为插入符号包不是最合适的?
预先感谢
解决方案请阅读
?createFolds
以了解该功能的作用。它会创建索引,这些索引定义将哪些数据 保留在单独的折叠中(请参阅返回相反内容的选项):>图书馆(插入符号)
>库(mlbench)
> data(声纳)
>
>折叠<-createFolds(Sonar $ Class)
> str(folds)
10
$ Fold01:int [1:21]的列表25 39 58 63 69 69 73 80 85 90 95 ...
$ Fold02:int [1:21] 19 21 42 48 52 66 72 81 88 89 ...
$ Fold03:int [1:21] 4 5 17 34 35 47 47 68 68 86 100 ...
$ Fold04:int [1: 21] 2 6 22 29 32 40 60 65 67 92 ...
$ Fold05:整数[1:20] 3 14 36 41 45 75 78 78 94 94104 ...
$ Fold06:int [ 1:21] 10 11 24 33 43 46 50 55 56 97 ...
$ Fold07:int [1:21] 1 7 8 20 23 23 28 31 44 71 76 ...
$ Fold08: int [1:20] 16 18 26 27 38 57 77 79 91 99 ...
$ Fold09:int [1:21] 13 15 30 37 49 53 74 83 93 96 ...
$ Fold10:int [1:21] 9 12 51 59 61 62 62 64 70 82 87 ...
要使用这些数据拆分数据,请执行以下操作:
> split_up<-lapply(folds,function(ind,dat)dat [ind,],dat = Sonar)
> dim(声纳)
[1] 208 61
> unlist(lapply(split_up,nrow))
Fold01 Fold02 Fold03 Fold04 Fold05 Fold06 Fold07 Fold08 Fold09 Fold10
21 21 21 21 20 21 21 20 21 21
此功能
train
用于此程序包中的实际建模(您通常不需要请自行进行拆分。请参见此页面)。
最大
I'm trying to make a k-fold CV for several classification methods/hiperparameters using the data available at
This set is made of 208 rows, each with 60 attributes. I'm reading it into a data.frame using the read.table function.
The next step is to split my data into k folds, let's say k = 5. My first attempt was to use
test <- createFolds(t, k=5)
I had two issues with this. The first one is that the lengths of the folds are not next to each other:
Length Class Mode
Fold1 29 -none- numeric
Fold2 14 -none- numeric
Fold3 7 -none- numeric
Fold4 5 -none- numeric
Fold5 5 -none- numericThe other one is that this apparently splitted my data according to the attributes indexes, but I want to split the data itself. I thought that by transposing my data.frame, using:
test <- t(myDataNumericValues)
But when I call the createFolds function, it gives me something like this:
Length Class Mode
Fold1 2496 -none- numeric
Fold2 2496 -none- numeric
Fold3 2495 -none- numeric
Fold4 2496 -none- numeric
Fold5 2497 -none- numericThe length issue was solved, but it's still not splitting my 208 data accordingly.
Any thoughts about what I can do? Do you think that the caret package is not the most appropriated?
Thanks in advance
解决方案Please read
?createFolds
to understand what the function does. It creates the indices that define which data are held out the separate folds (see the options to return the converse):> library(caret) > library(mlbench) > data(Sonar) > > folds <- createFolds(Sonar$Class) > str(folds) List of 10 $ Fold01: int [1:21] 25 39 58 63 69 73 80 85 90 95 ... $ Fold02: int [1:21] 19 21 42 48 52 66 72 81 88 89 ... $ Fold03: int [1:21] 4 5 17 34 35 47 54 68 86 100 ... $ Fold04: int [1:21] 2 6 22 29 32 40 60 65 67 92 ... $ Fold05: int [1:20] 3 14 36 41 45 75 78 84 94 104 ... $ Fold06: int [1:21] 10 11 24 33 43 46 50 55 56 97 ... $ Fold07: int [1:21] 1 7 8 20 23 28 31 44 71 76 ... $ Fold08: int [1:20] 16 18 26 27 38 57 77 79 91 99 ... $ Fold09: int [1:21] 13 15 30 37 49 53 74 83 93 96 ... $ Fold10: int [1:21] 9 12 51 59 61 62 64 70 82 87 ...
To use these to split the data:
> split_up <- lapply(folds, function(ind, dat) dat[ind,], dat = Sonar) > dim(Sonar) [1] 208 61 > unlist(lapply(split_up, nrow)) Fold01 Fold02 Fold03 Fold04 Fold05 Fold06 Fold07 Fold08 Fold09 Fold10 21 21 21 21 20 21 21 20 21 21
The function
train
is used in this package to do the actual modeling (you don't usually need to do the splitting yourself. See this page).Max
这篇关于使用Caret在R中为k折CV创建折痕的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!