并行化随机森林 [英] Parallelizing random forests

查看:137
本文介绍了并行化随机森林的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通过搜索和询问,我发现了许多可以用来利用服务器所有核心的软件包,以及许多可以进行随机森林操作的软件包.

Through searching and asking, I've found many packages I can use to make use of all the cores of my server, and many packages that can do random forest.

我在这方面还很陌生,在迷惑我的随机森林训练的所有方法之间我迷路了.您能否就使用和/或避免使用它们的原因,或它们的某些特定组合(有或没有caret吗?)给出一些建议?

I'm quite new at this, and I'm getting lost between all the ways to parallelize the training of my random forest. Could you give some advice on reasons to use and/or avoid each of them, or some specific combinations of them (and with or without caret ?) that have made their proof ?

用于并行化的软件包:

doParallel

doSNOW

doSMP(已停产?),

doMC

(以及mclapply呢?)

随机森林的软件包:

[caret +以下内容中的一些]

[caret + some of the following]

rf

parRF

randomForest

ranger

Rborist

parallelRandomForest(破坏了我的R Studio会话...)

parallelRandomForest (crashes my R Studio session...)

谢谢

推荐答案

SO上有一些答案,例如有关加速随机森林的建议,我来看看.

There are a few answers on SO, such as parallel execution of random forest in R and Suggestions for speeding up Random Forests, that I would take a look at.

这些帖子很有帮助,但年龄稍大. ranger软件包是随机森林的一种特别快速的实现,因此,如果您不熟悉它,它可能是加快模型训练的最简单方法. 他们的论文讨论了一些可用软件包的取舍-取决于您的数据大小和数量功能,哪个包可为您带来最佳性能.

Those posts are helpful, but are a bit older. the ranger package is an especially fast implementation of random forest, so if you are new to this it might be the easiest way to speed up your model training. Their paper discusses the tradeoffs of some of the available packages - depending on your data size and number of features, which package gives you the best performance will vary.

这篇关于并行化随机森林的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆