是否需要同时运行具有交叉验证的随机森林 [英] is it neccessary to run random forest with cross validation at the same time

查看:510
本文介绍了是否需要同时运行具有交叉验证的随机森林的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

随机森林是一种健壮的算法.在随机森林中,它训练几棵小树,并且具有OOB精度.但是,是否有必要同时对随机森林进行交叉验证?

Random forest is a robust algorithm. In Random Forest, it trains several small trees and have OOB accuracy. However, is it necessary to run cross-validation with random forest at the same time ?

推荐答案

OOB错误是随机森林错误的无偏估计,因此非常好.但是,您将交叉验证用于什么?如果将RF与未以相同方式使用装袋的其他算法进行比较,则需要一种低方差的方式来比较它们.无论如何,您必须使用交叉验证来支持其他算法.然后将交叉验证样本拆分用于RF和其他算法仍然是一个好主意,这样您就可以摆脱由拆分选择引起的差异.

OOB error is an unbiased estimate of the error for random forests, so that's great. But what are you using the cross validation for? If you are comparing the RF against some other algorithm that isn't using bagging in the same way, you want a low variance way to compare them. You have to use cross validation anyway to support the other algorithm. Then using the cross validation sample splits for the RF and the other algorithm is still a good idea, so that you get rid of the variance caused by the split selection.

如果将一个RF与另一个具有不同功能集的RF进行比较,则比较OOB错误是合理的.如果您确保在训练期间两个RF使用相同的套袋装置,则尤其如此.

If you are comparing one RF against another RF with a different feature set, then comparing OOB errors is reasonable. This is especially true if you make sure both RFs use the same bagging sets during training.

这篇关于是否需要同时运行具有交叉验证的随机森林的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆