如何并行化scikit学习SVM(SVC)分类器的.predict()方法? [英] How to parallelise .predict() method of a scikit-learn SVM (SVC) Classifier?

查看:663
本文介绍了如何并行化scikit学习SVM(SVC)分类器的.predict()方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近遇到了一个要求,我必须具有 .fit()受过训练的 scikit-learn SVC 分类器实例,需要 .predict() 大量实例。

I recently came across a requirement that I have a .fit() trained scikit-learn SVC Classifier instance and need to .predict() lots of instances.

是否可以通过任何来并行化此 .predict()方法。 scikit-learn 内置工具?

Is there a way to parallelise only this .predict() method by any scikit-learn built-in tools?

from sklearn import svm

data_train = [[0,2,3],[1,2,3],[4,2,3]]
targets_train = [0,1,0]

clf = svm.SVC(kernel='rbf', degree=3, C=10, gamma=0.3, probability=True)
clf.fit(data_train, targets_train)

# this can be very large (~ a million records)
to_be_predicted = [[1,3,4]]
clf.predict(to_be_predicted)

如果有人确实知道一个解决方案,那么如果您能分享它,我将非常高兴。

If somebody does know a solution, I will be more than happy if you could share it.

推荐答案

这可能是错误的,但是类似的事情应该可以解决。基本上,将数据分成多个块,并在 joblib.Parallel 循环中分别在每个块上运行模型。

This may be buggy, but something like this should do the trick. Basically, break your data into blocks and run your model on each block separately in a joblib.Parallel loop.

from sklearn.externals.joblib import Parallel, delayed

n_cores = 2
n_samples = to_be_predicted.shape[0]
slices = [
    (n_samples*i/n_cores, n_samples*(i+1)/n_cores))
    for i in range(n_cores)
    ]

results = np.vstack( Parallel( n_jobs = n_cores )( 
    delayed(clf.predict)( to_be_predicted[slices[i_core][0]:slices[i_core][1]
    for i_core in range(n_cores)
    ))

这篇关于如何并行化scikit学习SVM(SVC)分类器的.predict()方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆