R randomForest 太多类别错误即使少于 53 个类别 [英] R randomForest too many categories error even with fewer than 53 categories

查看:132
本文介绍了R randomForest 太多类别错误即使少于 53 个类别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下内容制作随机森林

I'm trying to make a random forest with the following

movies.rf <- randomForest(Infl.Adj.Dom.BoxOffice~. -Genre -Source -ProductionMethod -CreativeType, data=Movies, subset=train)

我明白

Error in randomForest.default(m, y, ...) : Can not handle categorical predictors with more than 53 categories.

阅读this后,我尝试检查这些值我的变量并得到了这个

After reading this I tried to check the values of my variables and got this

>length(unique(Movies$Genre))
[1] 12
> length(unique(Movies$Source))
[1] 16
> length(unique(Movies$ProductionMethod))
[1] 5
> length(unique(Movies$CreativeType))
[1] 9

单独来看,没有一个大于 53,加起来小于 53,为什么还是报错?

Individually, none of them is greater than 53, and added together, they are less than 53. So why do I still get the error?

推荐答案

如果从您的问题的上下文看来,您打算仅使用这四个功能(Genre、Source、ProductionMethod、CreativeTypecode>) 以预测 Infl.Adj.Dom.BoxOffice,那么您以错误的方式使用 R 公式:您的用法

If, as it seems from the context of your question, you intend to use only these four features (Genre, Source, ProductionMethod, CreativeType) in order to predict Infl.Adj.Dom.BoxOffice, then you are using the R formula in a wrong way: your usage

Infl.Adj.Dom.BoxOffice~. -Genre -Source -ProductionMethod -CreativeType

实际上说预测 Infl.Adj.Dom.BoxOffice 使用所有功能 (.) except Genre, Source,ProductionMethod, CreativeType"(- 符号用于 排除变量).

in fact says "predict Infl.Adj.Dom.BoxOffice using all features (.) except Genre, Source, ProductionMethod, CreativeType" (the - symbol is used for excluding variables).

因此,这里实际发生的情况是,您的一个(或多个)其他特征是具有超过 53 个级别的分类特征.

So, what actually happens here, is that one (or more) of your other features is a categorical one with more than 53 levels.

如果您确实只想使用您提到的这四个功能,正确的用法应该是:

The correct usage, if indeed you want to use only these four features you mention, should be:

movies.rf <- randomForest(Infl.Adj.Dom.BoxOffice ~ Genre + Source + ProductionMethod + CreativeType, data=Movies, subset=train)

这篇关于R randomForest 太多类别错误即使少于 53 个类别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆