R Caret的rfe [{错误:任务1失败-"rfe期望184个重要值,但只有2个"] [英] R Caret's rfe [Error in { : task 1 failed - "rfe is expecting 184 importance values but only has 2"]
问题描述
我正在使用Caret的rfe进行回归应用.我的数据(在data.table
中)具有176个预测变量(包括49个因子预测变量).运行该函数时,出现此错误:
I am using Caret's rfe for a regression application. My data (in data.table
) has 176 predictors (including 49 factor predictors). When I run the function, I get this error:
Error in { : task 1 failed - "rfe is expecting 176 importance values but only has 2"
然后,我使用model.matrix( ~ . - 1, data = as.data.frame(train_model_sell_single_bid))
将因子预测变量转换为虚拟变量.但是,我遇到了类似的错误:
Then, I used model.matrix( ~ . - 1, data = as.data.frame(train_model_sell_single_bid))
to convert the factor predictors to dummy variables. However, I got similar error:
Error in { : task 1 failed - "rfe is expecting 184 importance values but only has 2"
我在Windows 7(64位)上使用R版本3.1.1,在插入符号版本6.0-41中使用.我还安装了Revolution R Enterprise 7.3版(64位). 但是在具有R版本3.0.1和Caret版本6.0-24的Amazon EC2(c3.8xlarge)Linux实例上也重现了相同的错误.
I'm using R version 3.1.1 on Windows 7 (64-bit), Caret version 6.0-41. I also have Revolution R Enterprise version 7.3 (64-bit) installed. But the same error was reproduced on Amazon EC2 (c3.8xlarge) Linux instance with R version 3.0.1 and Caret version 6.0-24.
使用的数据集(重现我的错误):
Datasets used (to reproduce my error):
https://www.dropbox.com/s/utuk9bpxl2996dy/train_model_sell_single_bid.RData?dl = 0 https://www.dropbox.com/s/s9xcgfit3iqjffp/train_model_bid_outcomes_sell_s ?dl = 0
我的代码:
library(caret)
library(data.table)
library(bit64)
library(doMC)
load("train_model_sell_single_bid.RData")
load("train_model_bid_outcomes_sell_single.RData")
subsets <- seq(from = 4, to = 184, by= 4)
registerDoMC(cores = 32)
set.seed(1015498)
ctrl <- rfeControl(functions = lmFuncs,
method = "repeatedcv",
repeats = 1,
#saveDetails = TRUE,
verbose = FALSE)
x <- as.data.frame(train_model_sell_single_bid[,!"security_id", with=FALSE])
y <- train_model_bid_outcomes_sell_single[,bid100]
lmProfile_single_bid100 <- rfe(x, y,
sizes = subsets,
preProc = c("center", "scale"),
rfeControl = ctrl)
推荐答案
似乎您可能具有高度相关的预测变量.
在选择功能之前,您应该运行:
It seems that you might have highly correlated predictors.
Prior to feature selection you should run:
crrltn = findCorrelation(correlations, cutoff = .90)
if (length(crrltn) != 0)
x <- x[,-crrltn]
如果在此之后问题仍然存在,则可能与自动生成的折叠内的预测变量的高度相关性有关,您可以尝试使用以下方法控制生成的折叠:
If after this the problem persists, it might be related to high correlation of the predictors within folds automatically generated, you can try to control the generated folds with:
set.seed(12213)
index <- createFolds(y, k = 10, returnTrain = T)
,然后将它们作为rfeControl函数的参数:
and then give these as arguments to the rfeControl function:
lmctrl <- rfeControl(functions = lmFuncs,
method = "repeatedcv",
index = index,
verbose = TRUE)
set.seed(111333)
lrprofile <- rfe( z , x,
sizes = sizes,
rfeControl = lmctrl)
如果您仍然遇到相同的问题,请检查各折中的预测变量之间是否存在高度相关性:
If you keep having the same problem, check if there are highly correlated between predictors within each fold:
for(i in 1:length(index)){
crrltn = cor(x[index[[i]],])
findCorrelation(crrltn, cutoff = .90, names = T, verbose = T)
}
这篇关于R Caret的rfe [{错误:任务1失败-"rfe期望184个重要值,但只有2个"]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!