R randomForest 太多类别错误即使少于 53 个类别 [英] R randomForest too many categories error even with fewer than 53 categories
问题描述
我正在尝试使用以下内容制作随机森林
I'm trying to make a random forest with the following
movies.rf <- randomForest(Infl.Adj.Dom.BoxOffice~. -Genre -Source -ProductionMethod -CreativeType, data=Movies, subset=train)
我明白
Error in randomForest.default(m, y, ...) : Can not handle categorical predictors with more than 53 categories.
阅读this后,我尝试检查这些值我的变量并得到了这个
After reading this I tried to check the values of my variables and got this
>length(unique(Movies$Genre))
[1] 12
> length(unique(Movies$Source))
[1] 16
> length(unique(Movies$ProductionMethod))
[1] 5
> length(unique(Movies$CreativeType))
[1] 9
单独来看,没有一个大于 53,加起来小于 53,为什么还是报错?
Individually, none of them is greater than 53, and added together, they are less than 53. So why do I still get the error?
推荐答案
如果从您的问题的上下文看来,您打算仅使用这四个功能(Genre、Source、ProductionMethod、CreativeType
code>) 以预测 Infl.Adj.Dom.BoxOffice
,那么您以错误的方式使用 R 公式:您的用法
If, as it seems from the context of your question, you intend to use only these four features (Genre, Source, ProductionMethod, CreativeType
) in order to predict Infl.Adj.Dom.BoxOffice
, then you are using the R formula in a wrong way: your usage
Infl.Adj.Dom.BoxOffice~. -Genre -Source -ProductionMethod -CreativeType
实际上说预测 Infl.Adj.Dom.BoxOffice
使用所有功能 (.
) except Genre, Source,ProductionMethod, CreativeType
"(-
符号用于 排除变量).
in fact says "predict Infl.Adj.Dom.BoxOffice
using all features (.
) except Genre, Source, ProductionMethod, CreativeType
" (the -
symbol is used for excluding variables).
因此,这里实际发生的情况是,您的一个(或多个)其他特征是具有超过 53 个级别的分类特征.
So, what actually happens here, is that one (or more) of your other features is a categorical one with more than 53 levels.
如果您确实只想使用您提到的这四个功能,正确的用法应该是:
The correct usage, if indeed you want to use only these four features you mention, should be:
movies.rf <- randomForest(Infl.Adj.Dom.BoxOffice ~ Genre + Source + ProductionMethod + CreativeType, data=Movies, subset=train)
这篇关于R randomForest 太多类别错误即使少于 53 个类别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!