caret 包中的 train() 返回有关名称和的错误gsub [英] train() in caret package returns an error about names & gsub
问题描述
我使用 caret 包来预测 improvementNoticed
变量
I am using caret package to predict the improvementNoticed
variable
library(caret)
head(trainData)
improvementNoticed V1 V2
681 0 0.06451613 0.006060769
1484 0 0.77924586 0.331009145
1356 0 0.22222222 0.017538684
541 0 0.21505376 0.011102470
2214 1 0.59195217 0.064764408
1111 0 0.97979798 0.036445064
V3 V4 V5
681 0.008182531 0.05263158 0
1484 0.316603794 0.88825188 0
1356 0.016182822 0.20000000 0
541 0.012665610 0.10000000 0
2214 0.051008693 0.55000000 0
1111 0.034643632 0.93333333 0
然后我跑
myControl = trainControl(method='cv',number=5,repeats=2,returnResamp='none')
model1 = train(improvementNoticed~., data=trainData, method = 'glm', trControl=myControl)
我收到以下错误:
Error in names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), :
'names' attribute [1] must be the same length as the vector [0]
这是 trainData[,1]
是一个因子(其余数字)的结果;以前(当 trainData[,1]
是数字时)我得到了一个不同的错误:
This is a result of the trainData[,1]
being a factor (rest numeric); previously (when trainData[,1]
was numeric) I got a different error:
Error in cut.default(y, unique(quantile(y, probs = seq(0, 1, length = cuts))), :
invalid number of intervals
请注意,improvementNoticed
是一个二进制变量.
Please note that improvementNoticed
is a binary variable.
如果我将 trainData[,1]
转换为 integer
,我会得到与数字相同的错误.
If i convert trainData[,1]
into integer
, I get the same error, as with a numeric.
最后两件事:
traceback()
5: createFolds(y, trControl$number, returnTrain = TRUE)
4: train.default(x, y, weights = w, ...)
3: train(x, y, weights = w, ...)
2: train.formula(improvementNoticed ~ ., data = trainData, method = "glm",
trControl = myControl)
1: train(improvementNoticed ~ ., data = trainData, method = "glm",
trControl = myControl)
以及sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] elasticnet_1.1 lars_1.2 klaR_0.6-9 MASS_7.3-26
[5] kernlab_0.9-18 nnet_7.3-6 randomForest_4.6-7 doMC_1.3.0
[9] iterators_1.0.6 caret_5.17-7 reshape2_1.2.2 plyr_1.8
[13] lattice_0.20-15 foreach_1.4.1 cluster_1.14.4
loaded via a namespace (and not attached):
[1] codetools_0.2-8 grid_3.0.1 stringr_0.6.2 tools_3.0.1
推荐答案
碰巧,这个错误是一个非常基本的错误.
As it happens, the error was a really basic one.
我正在对数据执行规范化(我不怀疑会导致问题),但结果发现其中一个变量只有 0;因此我得到了所有的 NaN,这导致模型失败.
I was performing normalization on the data (that I did not suspect would cause the issue) but it turned out one of the variables only had 0's in it; hence I got all NaN's, which caused the model to fail.
这篇关于caret 包中的 train() 返回有关名称和的错误gsub的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!