在使用插入式的train()使用公式训练的randomForest对象上使用predict()时出错 [英] Error when using predict() on a randomForest object trained with caret's train() using formula

查看：153 发布时间：2020/11/10 5:35:52 r formula random-forest r-caret predict

本文介绍了在使用插入式的train()使用公式训练的randomForest对象上使用predict()时出错的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在64位Linux计算机上将R 3.2.0与插入号6.0-41和randomForest 4.6-10一起使用.

Using R 3.2.0 with caret 6.0-41 and randomForest 4.6-10 on a 64-bit Linux machine.

当尝试使用公式从caret包中的train()函数训练的randomForest对象上使用predict()方法时，该函数将返回错误. 当通过randomForest()和/或使用x=和y=而不是公式进行训练时，它们都运行平稳.

When trying to use the predict() method on a randomForest object trained with the train() function from the caret package using a formula, the function returns an error. When training via randomForest() and/or using x= and y= rather than a formula, it all runs smoothly.

这是一个有效的示例:

library(randomForest)
library(caret)

data(imports85)
imp85     <- imports85[, c("stroke", "price", "fuelType", "numOfDoors")]
imp85     <- imp85[complete.cases(imp85), ]
imp85[]   <- lapply(imp85, function(x) if (is.factor(x)) x[,drop=TRUE] else x) ## Drop empty levels for factors.

modRf1  <- randomForest(numOfDoors~., data=imp85)
caretRf <- train( numOfDoors~., data=imp85, method = "rf" )
modRf2  <- caretRf$finalModel
modRf3  <- randomForest(x=imp85[,c("stroke", "price", "fuelType")], y=imp85[, "numOfDoors"])
caretRf <- train(x=imp85[,c("stroke", "price", "fuelType")], y=imp85[, "numOfDoors"], method = "rf")
modRf4  <- caretRf$finalModel

p1      <- predict(modRf1, newdata=imp85)
p2      <- predict(modRf2, newdata=imp85)
p3      <- predict(modRf3, newdata=imp85)
p4      <- predict(modRf4, newdata=imp85)

在最后4行中，只有第二行p2 <- predict(modRf2, newdata=imp85)返回以下错误:

Among the last 4 lines, only the second one p2 <- predict(modRf2, newdata=imp85) returns the following error:

Error in predict.randomForest(modRf2, newdata = imp85) : 
variables in the training data missing in newdata

该错误的原因似乎是predict.randomForest方法使用rownames(object$importance)来确定用于训练随机森林object的变量的名称.而当看着

It seems that the reason for this error is that the predict.randomForest method uses rownames(object$importance) to determine the name of the variables used to train the random forest object. And when looking at

rownames(modRf1$importance)
rownames(modRf2$importance)
rownames(modRf3$importance)
rownames(modRf4$importance)

我们看到了:

[1] "stroke"   "price"    "fuelType"
[1] "stroke"   "price"    "fuelTypegas"
[1] "stroke"   "price"    "fuelType"
[1] "stroke"   "price"    "fuelType"

以某种方式，当将caret train()函数与公式一起使用时，会更改randomForest对象的importance字段中的(因子)变量的名称.

So somehow, when using the caret train() function with a formula changes the name of the (factor) variables in the importance field of the randomForest object.

插入符号train()函数的公式和非公式版本之间确实不一致吗?还是我错过了什么?

Is it really an inconsistency between the formula and and non-formula version of the caret train() function? Or am I missing something?

在使用插入式的train()使用公式训练的randomForest对象上使用predict()时出错 [英] Error when using predict() on a randomForest object trained with caret's train() using formula

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在使用插入式的train()使用公式训练的randomForest对象上使用predict()时出错 [英] Error when using predict() on a randomForest object trained with caret&#39;s train() using formula

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

在使用插入式的train()使用公式训练的randomForest对象上使用predict()时出错 [英] Error when using predict() on a randomForest object trained with caret's train() using formula

登录关闭