随机森林中 tuneGrid 参数的问题 [英] Issues with tuneGrid parameter in random forest

查看:57
本文介绍了随机森林中 tuneGrid 参数的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在处理一些极度不平衡的数据,我想使用分层抽样来创建更平衡的随机森林

I've been dealing with some extremely imbalanced data and I would like to use stratified sampling to created more balanced random forests

现在,我正在使用 caret 包,主要用于调整随机森林.因此,我尝试设置一个 tuneGrid 将 mtry 和 sampsize 参数传递到 caret train 方法中,如下所示.

Right now, I'm using the caret package, mainly to for tuning the random forests. So I try to setup a tuneGrid to pass in the mtry and sampsize parameters into caret train method as follows.

mtryGrid <- data.frame(.mtry = 100),.sampsize=80)
rfTune<- train(x = trainX,
               y = trainY,
               method = "rf",
               trControl = ctrl,
               metric = "Kappa",
               ntree = 1000,
               tuneGrid = mtryGrid,
               importance = TRUE)

运行此示例时,出现以下错误

When I run this example, I get the following error

The tuning parameter grid should have columns mtry

我遇到过类似 this 表明应该可以传入这些参数.

I've come across discussions like this suggesting that passing in these parameters in should be possible.

另一方面,这个页面表明唯一可以传递的参数in是mtry

On the other hand, this page suggests that the only parameter that can be passed in is mtry

我什至可以通过插入符号将 sampsize 传入随机森林吗?

Can I even pass in sampsize into the random forests via caret?

推荐答案

您的 mtryGrid 似乎存在括号问题.或者,您也可以使用 expand.grid 来给出您想要尝试的 mtry 的不同值.默认情况下,您可以为随机森林调整的唯一参数是 mtry.但是,您仍然可以将其他参数传递给 train.但是这些将具有固定值,因此 train 不会对其进行调整.但是您仍然可以要求在 train 中使用分层样本.下面是我会怎么做,假设 trainY 是一个布尔变量,您希望根据它对您的样本进行分层,并且您希望每个类别的样本大小为 80:

It looks like there is a bracket issue with your mtryGrid. Alternatively, you can also use expand.grid to give the different values of mtry you want to try. By default the only parameter you can tune for a random forest is mtry. However you can still pass the others parameters to train. But those will have a fix value an so won't be tuned by train. But you can still ask to use a stratified sample in train. Below is how I would do, assuming that trainY is a boolean variable according which you want to stratify your samples, and that you want samples of size 80 for each category:

mtryGrid <- expand.grid(mtry = 100) # you can put different values for mtry
rfTune<- train(x = trainX,
               y = trainY,
               method = "rf",
               trControl = ctrl,
               metric = "Kappa",
               ntree = 1000,
               tuneGrid = mtryGrid,
               strata = factor(trainY),
               sampsize = c(80, 80), 
               importance = TRUE)

这篇关于随机森林中 tuneGrid 参数的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆