如何并行化scikit学习SVM(SVC)分类器的.predict()方法? [英] How to parallelise .predict() method of a scikit-learn SVM (SVC) Classifier?
问题描述
我最近遇到了一个要求,我必须具有 .fit()
受过训练的 scikit-learn
SVC
分类器实例,需要 .predict()
大量实例。
I recently came across a requirement that I have a .fit()
trained scikit-learn
SVC
Classifier instance and need to .predict()
lots of instances.
是否可以通过任何来并行化此
内置工具? .predict()
方法。 scikit-learn
Is there a way to parallelise only this .predict()
method by any scikit-learn
built-in tools?
from sklearn import svm
data_train = [[0,2,3],[1,2,3],[4,2,3]]
targets_train = [0,1,0]
clf = svm.SVC(kernel='rbf', degree=3, C=10, gamma=0.3, probability=True)
clf.fit(data_train, targets_train)
# this can be very large (~ a million records)
to_be_predicted = [[1,3,4]]
clf.predict(to_be_predicted)
如果有人确实知道一个解决方案,那么如果您能分享它,我将非常高兴。
If somebody does know a solution, I will be more than happy if you could share it.
推荐答案
这可能是错误的,但是类似的事情应该可以解决。基本上,将数据分成多个块,并在 joblib.Parallel
循环中分别在每个块上运行模型。
This may be buggy, but something like this should do the trick. Basically, break your data into blocks and run your model on each block separately in a joblib.Parallel
loop.
from sklearn.externals.joblib import Parallel, delayed
n_cores = 2
n_samples = to_be_predicted.shape[0]
slices = [
(n_samples*i/n_cores, n_samples*(i+1)/n_cores))
for i in range(n_cores)
]
results = np.vstack( Parallel( n_jobs = n_cores )(
delayed(clf.predict)( to_be_predicted[slices[i_core][0]:slices[i_core][1]
for i_core in range(n_cores)
))
这篇关于如何并行化scikit学习SVM(SVC)分类器的.predict()方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!