使用Caret在R中为k折CV创建折痕 [英] Creating folds for k-fold CV in R using Caret

查看:101
本文介绍了使用Caret在R中为k折CV创建折痕的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用


上的可用数据对几种分类方法/ hiperparameter进行k折CV。 = http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data rel = noreferrer> http://archive.ics。 uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data


此集合由208行组成,每行具有60个属性。我正在使用read.table函数将其读取到data.frame中。



下一步是将数据拆分为k折,假设k = 5。我的首次尝试是使用


test<-createFolds(t,k = 5)



 长度等级模式

Fold1 29-无数值

Fold2 14-无数值

Fold3 7-无数字

Fold4 5-无数字

Fold5 5-无数字


另一个是,这显然是根据属性索引分割了我的数据,但是我想分割数据本身。我以为通过转置data.frame,使用:


test<-t(myDataNumericValues)


但是当我调用createFolds函数时,它给了我这样的东西:


 长度类模式

折叠1 2496-无数字

Fold2 2496-无-数字

Fold3 2495-无-数字

Fold4 2496-无-数字

Fold5 2497-无数值


长度问题已解决,但仍未相应拆分我的208个数据。



关于我能做什么的任何想法?您是否认为插入符号包不是最合适的?



预先感谢

解决方案

请阅读?createFolds 以了解该功能的作用。它会创建索引,这些索引定义将哪些数据 保留在单独的折叠中(请参阅返回相反内容的选项):

 >图书馆(插入符号)
>库(mlbench)
> data(声纳)
>
>折叠<-createFolds(Sonar $ Class)
> str(folds)
10
$ Fold01:int [1:21]的列表25 39 58 63 69 69 73 80 85 90 95 ...
$ Fold02:int [1:21] 19 21 42 48 52 66 72 81 88 89 ...
$ Fold03:int [1:21] 4 5 17 34 35 47 47 68 68 86 100 ...
$ Fold04:int [1: 21] 2 6 22 29 32 40 60 65 67 92 ...
$ Fold05:整数[1:20] 3 14 36 41 45 75 78 78 94 94104 ...
$ Fold06:int [ 1:21] 10 11 24 33 43 46 50 55 56 97 ...
$ Fold07:int [1:21] 1 7 8 20 23 23 28 31 44 71 76 ...
$ Fold08: int [1:20] 16 18 26 27 38 57 77 79 91 99 ...
$ Fold09:int [1:21] 13 15 30 37 49 53 74 83 93 96 ...
$ Fold10:int [1:21] 9 12 51 59 61 62 62 64 70 82 87 ...

要使用这些数据拆分数据,请执行以下操作:

 > split_up<-lapply(folds,function(ind,dat)dat [ind,],dat = Sonar)
> dim(声纳)
[1] 208 61
> unlist(lapply(split_up,nrow))
Fold01 Fold02 Fold03 Fold04 Fold05 Fold06 Fold07 Fold08 Fold09 Fold10
21 21 21 21 20 21 21 20 21 21


此功能 train 用于此程序包中的实际建模(您通常不需要请自行进行拆分。请参见此页面)。



最大


I'm trying to make a k-fold CV for several classification methods/hiperparameters using the data available at

http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data.

This set is made of 208 rows, each with 60 attributes. I'm reading it into a data.frame using the read.table function.

The next step is to split my data into k folds, let's say k = 5. My first attempt was to use

test <- createFolds(t, k=5)

I had two issues with this. The first one is that the lengths of the folds are not next to each other:

  Length Class  Mode   

Fold1 29 -none- numeric
Fold2 14 -none- numeric
Fold3 7 -none- numeric
Fold4 5 -none- numeric
Fold5 5 -none- numeric

The other one is that this apparently splitted my data according to the attributes indexes, but I want to split the data itself. I thought that by transposing my data.frame, using:

test <- t(myDataNumericValues)

But when I call the createFolds function, it gives me something like this:

  Length Class  Mode   

Fold1 2496 -none- numeric
Fold2 2496 -none- numeric
Fold3 2495 -none- numeric
Fold4 2496 -none- numeric
Fold5 2497 -none- numeric

The length issue was solved, but it's still not splitting my 208 data accordingly.

Any thoughts about what I can do? Do you think that the caret package is not the most appropriated?

Thanks in advance

解决方案

Please read ?createFolds to understand what the function does. It creates the indices that define which data are held out the separate folds (see the options to return the converse):

  > library(caret)
  > library(mlbench)
  > data(Sonar)
  > 
  > folds <- createFolds(Sonar$Class)
  > str(folds)
  List of 10
   $ Fold01: int [1:21] 25 39 58 63 69 73 80 85 90 95 ...
   $ Fold02: int [1:21] 19 21 42 48 52 66 72 81 88 89 ...
   $ Fold03: int [1:21] 4 5 17 34 35 47 54 68 86 100 ...
   $ Fold04: int [1:21] 2 6 22 29 32 40 60 65 67 92 ...
   $ Fold05: int [1:20] 3 14 36 41 45 75 78 84 94 104 ...
   $ Fold06: int [1:21] 10 11 24 33 43 46 50 55 56 97 ...
   $ Fold07: int [1:21] 1 7 8 20 23 28 31 44 71 76 ...
   $ Fold08: int [1:20] 16 18 26 27 38 57 77 79 91 99 ...
   $ Fold09: int [1:21] 13 15 30 37 49 53 74 83 93 96 ...
   $ Fold10: int [1:21] 9 12 51 59 61 62 64 70 82 87 ...

To use these to split the data:

   > split_up <- lapply(folds, function(ind, dat) dat[ind,], dat = Sonar)
   > dim(Sonar)
   [1] 208  61
   > unlist(lapply(split_up, nrow))
   Fold01 Fold02 Fold03 Fold04 Fold05 Fold06 Fold07 Fold08 Fold09 Fold10 
       21     21     21     21     20     21     21     20     21     21 

The function train is used in this package to do the actual modeling (you don't usually need to do the splitting yourself. See this page).

Max

这篇关于使用Caret在R中为k折CV创建折痕的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆