如何加速sklearn SVR? [英] How to speed up sklearn SVR?

查看:30
本文介绍了如何加速sklearn SVR?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 SVRlearn.org/stable/" rel="noreferrer">sklearn python 中的 svr 包.我的稀疏矩阵的大小为 146860 x 10202.我将其划分为大小为 2500 x 10202 的各种子矩阵.对于每个子矩阵,SVR 拟合大约需要 10 分钟.有什么方法可以加快这个过程?请建议任何不同的方法或不同的 python 包.谢谢!

I am implementing SVR using sklearn svr package in python. My sparse matrix is of size 146860 x 10202. I have divided it into various sub-matrices of size 2500 x 10202. For each sub matrix, SVR fitting is taking about 10 mins. What could be the ways to speed up the process? Please suggest any different approach or different python package for the same. Thanks!

推荐答案

您可以平均 SVR 子模型预测.

You can average the SVR sub-models predictions.

或者,您可以尝试在使用 Nystroem 方法.

Alternatively you can try to fit a linear regression model on the output of kernel expansion computed with the Nystroem method.

或者您可以尝试其他非线性回归模型,例如随机树的集成或梯度提升回归树.

Or you can try other non-linear regression models such as ensemble of randomized trees or gradient boosted regression trees.

编辑:我忘了说:内核 SVR 模型本身不可扩展,因为它的复杂性超过二次方,因此无法加速".

Edit: I forgot to say: the kernel SVR model itself is not scalable as its complexity is more than quadratic hence there is no way to "speed it up".

Edit 2:实际上,经常将输入变量缩放到 [0, 1][-1, 1] 或单位使用 StandardScaler 的方差可以大大加快收敛速度​​.

Edit 2: Actually, often scaling the input variables to [0, 1] or [-1, 1] or to unit variance using StandardScaler can speed up the convergence by quite a bit.

此外,默认参数也不太可能产生好的结果:您必须对 gamma 的最佳值进行网格搜索,也可能对 epsilon 的子样本进行网格搜索在拟合大型模型之前增加尺寸(以检查最佳参数的稳定性).

Also it is very unlikely that the default parameters will yield good results: you have to grid search the optimal value for gamma and maybe also epsilon on a sub samples of increasing sizes (to check the stability of the optimal parameters) before fitting to large models.

这篇关于如何加速sklearn SVR?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆