当训练集具有比测试集更多不同的因子水平时，randomForest 不起作用 [英] randomForest does not work when training set has more different factor levels than test set

查看：68 发布时间：2021/7/2 20:07:10 r random-forest

本文介绍了当训练集具有比测试集更多不同的因子水平时，randomForest 不起作用的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

尝试在因子水平低于训练数据的新测试数据上测试我训练的模型时，predict() 返回以下内容:

When trying to test my trained model on new test data that has fewer factor levels than my training data, predict() returns the following:

新数据中的预测变量类型与训练数据的类型不匹配.

Type of predictors in new data do not match that of the training data.

我的训练数据有一个有 7 个因子水平的变量，而我的测试数据有一个有 6 个因子水平的相同变量(训练数据中的所有 6 个 ARE).

My training data has a variable with 7 factor levels and my test data has that same variable with 6 factor levels (all 6 ARE in the training data).

当我添加一个包含缺失"第 7 个因子的观察值时，模型会运行，所以我不确定为什么会发生这种情况，甚至不确定其背后的逻辑.

When I add an observation containing the "missing" 7th factor, the model runs, so I'm not sure why this happens or even the logic behind it.

我可以看到测试集是否有更多/不同的因子水平，然后 randomForest 会卡住，但为什么在训练集有更多"数据的情况下?

I could see if the test set had more/different factor levels, then randomForest would choke, but why in the case where training set has "more" data?

推荐答案

R 期望训练数据和测试数据具有完全相同的级别(即使其中一个集合没有给定一个或多个级别的观测值).在您的情况下，由于测试数据集缺少火车具有的级别，您可以执行

R expects both the training and the test data to have the exact same levels (even if one of the sets has no observations for a given level or levels). In your case, since the test dataset is missing a level that the train has, you can do

test$val <- factor(test$val, levels=levels(train$val))

确保它具有相同的级别并且它们的编码相同.

to make sure it has all the same levels and they are coded the same say.

(重新发布在这里以结束问题)

(reposted here to close out the question)

这篇关于当训练集具有比测试集更多不同的因子水平时，randomForest 不起作用的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

当训练集具有比测试集更多不同的因子水平时，randomForest 不起作用 [英] randomForest does not work when training set has more different factor levels than test set

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

当训练集具有比测试集更多不同的因子水平时，randomForest 不起作用 [英] randomForest does not work when training set has more different factor levels than test set

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭