训练SVM分类器需要多少时间? [英] How much time does take train SVM classifier?
问题描述
我编写了以下代码,并在小数据上对其进行了测试:
I wrote following code and test it on small data:
classif = OneVsRestClassifier(svm.SVC(kernel='rbf'))
classif.fit(X, y)
其中X, y
(X-30000x784矩阵,y-30000x1)是numpy数组.在小数据上,算法效果很好,并给出了正确的结果.
Where X, y
(X - 30000x784 matrix, y - 30000x1) are numpy arrays. On small data algorithm works well and give me right results.
但是我大约10个小时前运行了我的程序...它仍在进行中.
But I run my program about 10 hours ago... And it is still in process.
我想知道它需要多长时间,或者它以某种方式卡住了? (笔记本电脑规格为4 GB内存,Core i5-480M)
I want to know how long it will take, or it stuck in some way? (Laptop specs 4 GB Memory, Core i5-480M)
推荐答案
SVM训练可以任意长,这取决于许多参数:
SVM training can be arbitrary long, this depends on dozens of parameters:
-
C
参数-错误分类罚则越大,过程越慢 - 内核-内核越复杂,进程越慢(rbf是预定义内核中最复杂的)
- 数据大小/维度-同样,同样的规则
C
parameter - greater the missclassification penalty, slower the process- kernel - more complicated the kernel, slower the process (rbf is the most complex from the predefined ones)
- data size/dimensionality - again, the same rule
通常,基本的SMO算法是O(n^3)
,因此,在30 000
数据点的情况下,它必须与2 700 000 000 000
成正比的运算次数实在是很大.您有什么选择?
in general, basic SMO algorithm is O(n^3)
, so in case of 30 000
datapoints it has to run number of operations proportional to the2 700 000 000 000
which is realy huge number. What are your options?
- 将内核更改为线性内核,784个功能很多,rbf可能是多余的
- 减少特征的维数(PCA?)
- 降低
C
参数 - 在数据子集上训练模型以找到合适的参数,然后在某个集群/超级计算机上训练整个参数
- change a kernel to the linear one, 784 features is quite a lot, rbf can be redundant
- reduce features' dimensionality (PCA?)
- lower the
C
parameter - train model on the subset of your data to find the good parameters and then train the whole one on some cluster/supercomputer
这篇关于训练SVM分类器需要多少时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!