预测scikit学习分类将运行多长时间 [英] Predicting how long an scikit-learn classification will take to run

查看:66
本文介绍了预测scikit学习分类将运行多长时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有一种方法可以根据参数和数据集来预测从sci-kit学习运行分类器需要多长时间?我知道,很漂亮,对吧?

Is there a way to predict how long it will take to run a classifier from sci-kit learn based on the parameters and dataset? I know, pretty meta, right?

一些分类器/参数组合非常快,有些花费了很长时间,以至于我最终干掉了这个过程.我想要一种提前估算所需时间的方法.

Some classifiers/parameter combinations are quite fast, and some take so long that I eventually just kill the process. I'd like a way to estimate in advance how long it will take.

或者,我会接受一些有关如何设置公共参数以减少运行时间的指针.

Alternatively, I'd accept some pointers on how to set common parameters to reduce the run time.

推荐答案

有非常特定的分类器或回归类,它们直接报告算法的剩余时间或进度(迭代次数等).通过将verbose=2(任何大于1的高数字)选项传递给各个模型的构造函数,可以打开大多数功能. 注意:此行为是根据sklearn-0.14进行的.早期版本的详细输出有些不同(尽管仍然有用).

There are very specific classes of classifier or regressors that directly report remaining time or progress of your algorithm (number of iterations etc.). Most of this can be turned on by passing verbose=2 (any high number > 1) option to the constructor of individual models. Note: this behavior is according to sklearn-0.14. Earlier versions have a bit different verbose output (still useful though).

最好的例子是ensemble.RandomForestClassifier或ensemble.GradientBoostingClassifier`,它可以打印到目前为止已构建的树木数量和剩余时间.

The best example of this is ensemble.RandomForestClassifier or ensemble.GradientBoostingClassifier` that print the number of trees built so far and remaining time.

clf = ensemble.GradientBoostingClassifier(verbose=3)
clf.fit(X, y)
Out:
   Iter       Train Loss   Remaining Time
     1           0.0769            0.10s
     ...

clf = ensemble.RandomForestClassifier(verbose=3)
clf.fit(X, y)
Out:
  building tree 1 of 100
  ...

此进度信息对于估计总时间非常有用.

This progress information is fairly useful to estimate the total time.

然后,还有其他模型,例如SVM,可打印完成的优化迭代次数,但不直接报告剩余时间.

Then there are other models like SVMs that print the number of optimization iterations completed, but do not directly report the remaining time.

clf = svm.SVC(verbose=2)
clf.fit(X, y)
Out:
   *
    optimization finished, #iter = 1
    obj = -1.802585, rho = 0.000000
    nSV = 2, nBSV = 2
    ...

据我所知,线性模型之类的模型不提供此类诊断信息.

Models like linear models don't provide such diagnostic information as far as I know.

检查此线程以了解详细程度级别的含义: scikit-learn适合剩余时间

Check this thread to know more about what the verbosity levels mean: scikit-learn fit remaining time

这篇关于预测scikit学习分类将运行多长时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆