具有非常不平衡的类的随机森林 [英] Random Forest with classes that are very unbalanced

查看：40 发布时间：2021/7/2 20:05:05 r random-forest

本文介绍了具有非常不平衡的类的随机森林的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在一个大数据问题中使用随机森林，它的响应类非常不平衡，所以我阅读了文档，发现了以下参数:

I am using random forests in a big data problem, which has a very unbalanced response class, so I read the documentation and I found the following parameters:

strata 

sampsize

这些参数的文档很少(或者我没有运气找到它)而且我真的不明白如何实现它.我正在使用以下代码:

The documentation for these parameters is sparse (or I didn´t have the luck to find it) and I really don´t understand how to implement it. I am using the following code:

randomForest(x=predictors, 
             y=response, 
             data=train.data, 
             mtry=lista.params[1], 
             ntree=lista.params[2], 
             na.action=na.omit, 
             nodesize=lista.params[3], 
             maxnodes=lista.params[4],
             sampsize=c(250000,2000), 
             do.trace=100, 
             importance=TRUE)

响应是一个具有两个可能值的类，第一个比第二个出现的频率更高(10000:1 或更多)

The response is a class with two possible values, the first one appears more frequently than the second (10000:1 or more)

list.params 是一个具有不同参数的列表(废话！我知道...)

The list.params is a list with different parameters (duh! I know...)

好吧，问题(再次)是:我如何使用 'strata' 参数?我正确使用了 sampsize?

Well, the question (again) is: How I can use the 'strata' parameter? I am using sampsize correctly?

最后，有时我会收到以下错误:

And finally, sometimes I get the following error:

Error in randomForest.default(x = predictors, y = response, data = train.data,  :
  Still have fewer than two classes in the in-bag sample after 10 attempts.

对不起，如果我问了这么多(也许是愚蠢的)问题......

Sorry If I am doing so many (and maybe stupid) questions ...

具有非常不平衡的类的随机森林 [英] Random Forest with classes that are very unbalanced

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

具有非常不平衡的类的随机森林 [英] Random Forest with classes that are very unbalanced

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭