r 随机森林错误 - 新数据中的预测变量类型不匹配 [英] r random forest error - type of predictors in new data do not match

查看:67
本文介绍了r 随机森林错误 - 新数据中的预测变量类型不匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在 R (quantregForest) 中使用分位数回归森林函数建立在随机森林包上.我收到一个类型不匹配错误,我不太明白为什么.

I am trying to use quantile regression forest function in R (quantregForest) which is built on Random Forest package. I am getting a type mismatch error that I can't quite figure why.

我使用以下方法训练模型

I train the model by using

qrf <- quantregForest(x = xtrain, y = ytrain)

工作没有问题,但是当我尝试使用新数据进行测试时

which works without a problem, but when I try to test with new data like

quant.newdata <- predict(qrf, newdata= xtest)

它给出了以下错误:

Error in predict.quantregForest(qrf, newdata = xtest) : 
Type of predictors in new data do not match types of the training data.

我的训练和测试数据来自不同的文件(因此是不同的数据框),但格式相同.我已经用

My training and testing data are coming from separate files (hence separate data frames) but having the same format. I have checked the classes of the predictors with

sapply(xtrain, class)
sapply(xtest, class)

输出如下:

> sapply(xtrain, class)
pred1     pred2     pred3     pred4     pred5     pred6     pred7     pred8 
"factor" "integer" "integer" "integer"  "factor"  "factor" "integer"  "factor" 
pred9    pred10    pred11    pred12 
"factor"  "factor"  "factor"  "factor" 


> sapply(xtest, class)
pred1     pred2     pred3     pred4     pred5     pred6     pred7     pred8 
"factor" "integer" "integer" "integer"  "factor"  "factor" "integer"  "factor" 
pred9    pred10    pred11    pred12 
"factor"  "factor"  "factor"  "factor" 

它们完全一样.我还检查了NA"值.xtrain 和 xtest 中都没有 NA 值.我在这里遗漏了一些微不足道的东西吗?

They are exactly the same. I also checked for the "NA" values. Neither xtrain nor xtest has a NA value in it. Am I missing something trivial here?

更新一:在训练数据上运行预测仍然给出相同的错误

Update I: running the prediction on the training data still gives the same error

> quant.newdata <- predict(qrf, newdata = xtrain)
Error in predict.quantregForest(qrf, newdata = xtrain) : 
names of predictor variables do not match

更新 II:我将训练集和测试集结合起来,因此从 1 到 101 的行是训练数据,其余的是测试.我将 (quantregForest) 中提供的示例修改为:

Update II: I combined my training and test sets so that rows from 1 to 101 are the training data and the rest is the testing. I modified the example provided in (quantregForest) as:

data <-  read.table("toy.txt", header = T)
n <- nrow(data)
indextrain <- 1:101
xtrain <- data[indextrain, 3:14]
xtest <- data[-indextrain, 3:14]
ytrain <- data[indextrain, 15]
ytest <- data[-indextrain, 15]

qrf <- quantregForest(x=xtrain, y=ytrain)
quant.newdata <- predict(qrf, newdata= xtest)

它有效!如果有人能解释为什么它以这种方式工作而不是以另一种方式工作,我将不胜感激?

And it works! I'd appreciate if any one could explain why it works this way and not with the other way?

推荐答案

我遇到了同样的问题.您可以尝试使用小技巧来均衡训练集和测试集的类别.将训练集的第一行绑定到测试集,然后将其删除.对于您的示例,它应该如下所示:

I had the same problem. You can try to use small trick to equalize classes of training and test set. Bind the first row of training set to the test set and than delete it. For your example it should look like this:

    xtest <- rbind(xtrain[1, ] , xtest)
    xtest <- xtest[-1,]

这篇关于r 随机森林错误 - 新数据中的预测变量类型不匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆