提高插入符号(R)中模型训练的速度 [英] Improving model training speed in caret (R)

查看:83
本文介绍了提高插入符号(R)中模型训练的速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含20个要素和大约300,000个观测值的数据集.我正在使用插入符号来训练带有doParallel和四个核心的模型.对于我尝试过的方法(射频,nnet,adabag,svmPoly),即使对10%的数据进行培训也要花费八个多小时.我正在使用自举进行3次重采样,而tuneLength是5.是否有什么我可以做的来加快这个令人痛苦的缓慢过程?有人建议使用基础库可以使我的处理速度提高10倍,但是在我走那条路线之前,我想确保没有其他选择.

I have a dataset consisting of 20 features and roughly 300,000 observations. I'm using caret to train model with doParallel and four cores. Even training on 10% of my data takes well over eight hours for the methods I've tried (rf, nnet, adabag, svmPoly). I'm resampling with with bootstrapping 3 times and my tuneLength is 5. Is there anything I can do to speed up this agonizingly slow process? Someone suggested using the underlying library can speed up my the process as much as 10x, but before I go down that route I'd like to make sure there is no other alternative.

推荐答案

@phiver碰到了头,但是对于这种情况,有几点建议:

@phiver hits the nail on the head but, for this situation, there are a few things to suggest:

  • 确保使用并行处理不会耗尽系统内存.使用 X 工作程序时,您正在制作 X 内存中数据的额外副本.
  • 由于类不平衡,其他采样可以提供帮助.下采样可能有助于提高性能并减少时间.
  • 使用不同的库. Ranger 代替 xgboost C5.0 而不是竞速型算法,可在更短的时间内调整参数
  • github上的开发版本对具有很多调整参数的模型具有随机搜索方法.
  • make sure that you are not exhausting your system memory by using parallel processing. You are making X extra copies of the data in memory when using X workers.
  • with a class imbalance, additional sampling can help. Downsampling might help improve performance and take less time.
  • use different libraries. ranger instead of randomForest, xgboost or C5.0 instead of gbm. You should realize that ensemble methods are fitting a ton of constituent models and a bound to take a while to fit.
  • the package has a racing-type algorithm for tuning parameters in less time
  • the development version on github has random search methods for the models with a lot of tuning parameters.

最大

这篇关于提高插入符号(R)中模型训练的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆