Caret 需要 TRUE/FALSE 的缺失值 [英] missing value where TRUE/FALSE needed with Caret
问题描述
我有一个数据框,其中包含日期变量".(测试数据和代码可在此处获得)
I have a data frame, which contains the "date variable". (the test data and code is available here)
但是,我使用function = caretFunc".它显示错误消息.
However, I use "function = caretFunc". It shows error message.
Error in { : task 1 failed - "missing value where TRUE/FALSE needed"
In addition: Warning messages:
1: In FUN(newX[, i], ...) : NAs introduced by coercion
2: In FUN(newX[, i], ...) : NAs introduced by coercion
3: In FUN(newX[, i], ...) : NAs introduced by coercion
4: In FUN(newX[, i], ...) : NAs introduced by coercion
5: In FUN(newX[, i], ...) : NAs introduced by coercion
6: In FUN(newX[, i], ...) : NAs introduced by coercion
7: In FUN(newX[, i], ...) : NAs introduced by coercion
8: In FUN(newX[, i], ...) : NAs introduced by coercion
9: In FUN(newX[, i], ...) : NAs introduced by coercion
10: In FUN(newX[, i], ...) : NAs introduced by coercion
我能做什么?
重现错误的代码:
library(mlbench)
library(caret)
library(maps)
library(rgdal)
library(raster)
library(sp)
library(spdep)
library(GWmodel)
library(e1071)
library(plyr)
library(kernlab)
library(zoo)
mydata <- read.csv("Realestatedata_all_delete_date.csv", header=TRUE)
mydata$estate_TransDate <- as.Date(paste(mydata$estate_TransDate,1,sep="-"),format="%Y-%m-%d")
mydata$estate_HouseDate <- as.Date(mydata$estate_HouseDate,format="%Y-%m-%d")
rfectrl <- rfeControl(functions=caretFuncs, method="cv",number=10,verbose=TRUE,returnResamp = "final")
results <- rfe(mydata[,1:48],mydata[,49],sizes = c(1:48),rfeControl=rfectrl,method = "svmRadial")
print(results)
predictors(results)
plot(results, type=c("g", "o"))
推荐答案
您在 mydata
中的以下输入变量中有 NAs
(缺失值)(您提供给分类器):
You have NAs
(missing values) in mydata
in the following input variables (which you feed to the classifier):
colnames(mydata)[unique(which(is.na(mydata[,1:48]), arr.ind = TRUE)[,2])]
给出:
[1] "Aport_Distance" "Univ_Distance" "ParkR_Distance"
[4] "TRA_StationDistance" "THSR_StationDistance" "Schools_Distance"
[7] "Lib_Distance" "Sport_Distance" "ParkS_Distance"
[10] "Hyper_Distance" "Shop_Distance" "Post_Distance"
[13] "Hosp_Distance" "Gas_Distance" "Incin_Distance"
[16] "Mort_Distance"
此外,看起来您的日期变量(交易日期和房屋日期)似乎在 rfe(..)
中转换为 NAs
.
In addition, it looks like your date variables (transaction date and house date) seem to be converted to NAs
inside rfe(..)
.
SVM 回归器似乎无法按原样处理 NAs
.
The SVM regressor seems not to be able to deal with NAs
as is.
我会将日期转换为自给定参考以来的年数":
I would convert the dates to something like 'years since a given reference':
mydata$estate_TransAge <- as.numeric(as.Date("2015-11-01") - mydata$estate_TransDate) / 365.25
mydata$estate_HouseAge <- as.numeric(as.Date("2015-11-01") - mydata$estate_HouseDate) / 365.25
# define the set of input variables
inputVars = setdiff(colnames(mydata),
# exclude these
c("estate_TransDate", "estate_HouseDate", "estate_TotalPrice")
)
并且还删除您用作回归器输入的任何列中带有任何 NA
的条目:
And also remove those entries with any NA
in any of the columns you use as input to the regressor:
traindata <- mydata[complete.cases(mydata[,inputVars]),]
然后使用以下命令运行 rfe:
then run rfe with:
rfectrl <- rfeControl(functions=caretFuncs, method="cv",number=10,verbose=TRUE,returnResamp = "final")
results <- rfe(
traindata[,inputVars],
traindata[,"estate_TotalPrice"],
rfeControl=rfectrl,
method = "svmRadial"
)
在我的例子中,这需要很长时间才能完成,所以我只对百分之一的数据进行了测试:
In my case, this would have taken a long time to complete, so I tested it only on one percent of the data using:
traindata <- sample_frac(traindata, 0.01)
问题仍然是,如果您有数据来预测价格,其中一些输入变量为 NA
.
The question remains what to do if your are given data to predict the price where some of input variables as NA
.
这篇关于Caret 需要 TRUE/FALSE 的缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!