caret 包中的 train() 返回有关名称和的错误gsub [英] train() in caret package returns an error about names & gsub

查看:51
本文介绍了caret 包中的 train() 返回有关名称和的错误gsub的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 caret 包来预测 improvementNoticed 变量

I am using caret package to predict the improvementNoticed variable

library(caret)
head(trainData)

     improvementNoticed                            V1               V2
681                   0                    0.06451613       0.006060769
1484                  0                    0.77924586       0.331009145
1356                  0                    0.22222222       0.017538684
541                   0                    0.21505376       0.011102470
2214                  1                    0.59195217       0.064764408
1111                  0                    0.97979798       0.036445064
               V3                                          V4       V5
681   0.008182531                                  0.05263158        0
1484  0.316603794                                  0.88825188        0
1356  0.016182822                                  0.20000000        0
541   0.012665610                                  0.10000000        0
2214  0.051008693                                  0.55000000        0
1111  0.034643632                                  0.93333333        0

然后我跑

myControl = trainControl(method='cv',number=5,repeats=2,returnResamp='none')
model1 = train(improvementNoticed~., data=trainData, method = 'glm', trControl=myControl)

我收到以下错误:

Error in names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))),  : 
  'names' attribute [1] must be the same length as the vector [0]

这是 trainData[,1] 是一个因子(其余数字)的结果;以前(当 trainData[,1] 是数字时)我得到了一个不同的错误:

This is a result of the trainData[,1] being a factor (rest numeric); previously (when trainData[,1] was numeric) I got a different error:

Error in cut.default(y, unique(quantile(y, probs = seq(0, 1, length = cuts))),  : 
  invalid number of intervals

请注意,improvementNoticed 是一个二进制变量.

Please note that improvementNoticed is a binary variable.

如果我将 trainData[,1] 转换为 integer,我会得到与数字相同的错误.

If i convert trainData[,1] into integer, I get the same error, as with a numeric.

最后两件事:

traceback()
5: createFolds(y, trControl$number, returnTrain = TRUE)
4: train.default(x, y, weights = w, ...)
3: train(x, y, weights = w, ...)
2: train.formula(improvementNoticed ~ ., data = trainData, method = "glm", 
       trControl = myControl)
1: train(improvementNoticed ~ ., data = trainData, method = "glm", 
       trControl = myControl)

以及sessionInfo()

R version 3.0.1 (2013-05-16)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] elasticnet_1.1     lars_1.2           klaR_0.6-9         MASS_7.3-26       
 [5] kernlab_0.9-18     nnet_7.3-6         randomForest_4.6-7 doMC_1.3.0        
 [9] iterators_1.0.6    caret_5.17-7       reshape2_1.2.2     plyr_1.8          
[13] lattice_0.20-15    foreach_1.4.1      cluster_1.14.4    

loaded via a namespace (and not attached):
[1] codetools_0.2-8 grid_3.0.1      stringr_0.6.2   tools_3.0.1   

推荐答案

碰巧,这个错误是一个非常基本的错误.

As it happens, the error was a really basic one.

我正在对数据执行规范化(我不怀疑会导致问题),但结果发现其中一个变量只有 0;因此我得到了所有的 NaN,这导致模型失败.

I was performing normalization on the data (that I did not suspect would cause the issue) but it turned out one of the variables only had 0's in it; hence I got all NaN's, which caused the model to fail.

这篇关于caret 包中的 train() 返回有关名称和的错误gsub的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆