在R中使用randomForest输入类型不匹配错误 [英] Type Mismatch Error using randomForest in R

查看:141
本文介绍了在R中使用randomForest输入类型不匹配错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在R中使用随机森林对一些kaggle数据进行分类,但是每当尝试使用自己创建的模型时,都会出现以下错误.

I am trying to use random forest in R for classifying some kaggle data but I keep getting the following error whenever I try to use the model which I have created.

Error in predict.randomForest(fit, newdata = test, type = "class") : 
  Type of predictors in new data do not match that of the training data

我完全不知道该错误的原因,而Google并没有太大帮助.任何帮助或见识将不胜感激.下面是简单的代码段,它是针对kaggle问题之一的响应.

I am totally lost as to the reason for this error and Google has not been of much help. Any help or insight will be appreciated. The simple code snippet is given below and its in response to one of the kaggle problems.

fit = randomForest(as.factor(IsBadBuy) ~ VehicleAge + WheelTypeID + Transmission + WarrantyCost + VehOdo + Auction, 
                   data=training, importance=TRUE, do.trace=100, keep.forest=TRUE)

prediction = predict(fit, newdata=test, type='class')

t = table(observed=test[, 'IsBadBuy'], predict=prediction)

推荐答案

对于像我这样的R新手...他们说的很对,即错误消息的含义与所说的完全一样:您的训练数据中至少有一个变量的类型与测试数据中的等效变量不匹配."

For a R newbie like me... They are right when they say "The error message means exactly what it says: there is at least one variable in your training data whose type does not match the equivalent variable in your test data."

执行以下操作以确认没有明显不同:
str(training) str(NewData)

Do run the following to confirm nothing is obviously different:
str(training) and str(NewData)

这将列出训练和新数据的功能和类型.像我以前一样,您可能仍然感到困惑的原因是数据类型可能看起来匹配,但仍然出错.可能两个要素组中的要素/列都被列为一个因素,但级别不同.我的新数据要小得多,没有训练数据所能达到的所有水平.这会让您大失所望.解决方法是:在处理新数据并将其分解时,传递所有可能的水平.这样可以使您匹配,并且一切正常.

That will list the training and new data's features and types. The reason why you might still be confused, as I was, is the datatypes might appear to match and yet the error. It's probably that while a feature/column in both sets is listed as a factor the levels are not the same. My new data was much smaller, didn't have all the levels the training data did. That will blow you up with this error. The fix is: when you are processing your new data and go to factor it, pass in all the possible levels. That will get you to match and things will work.

dataframe$ColToFactor <- factor(dataframe$ColToFactor, levels=c("PossibleLvl1", "PossibleLvl2", "PossibleLvl3", account for all possible))

那对我来说很重要.

这篇关于在R中使用randomForest输入类型不匹配错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆