将SVM分类器从sklearn导出到Java代码库 [英] Exporting SVM classifiers from sklearn to Java codebase
问题描述
我已经使用 sklearn
来训练一组SVM分类器(大多数是线性的,使用 LinearSVM
但其中一些是使用带有rbf内核的 SVC
类,我对结果非常满意。现在我需要将 production 中的分类器导出到另一个使用Java的代码库中。我正在寻找可以在maven中发布的可以轻松合并到这个新代码库中的库。
I have used sklearn
to train a set of SVM classifiers (mostly linear using LinearSVM
but some of them are using the SVC
class with rbf kernel) and I am pretty happy with the results. Now I need to export the classifiers in production into another codebase that uses Java. I am looking for possible libraries, that are published in maven, that can be easily incorporated in this new codebase.
你有什么建议?
推荐答案
线性分类器很简单:它们有 coef _
和 intercept_
,在类docstrings中描述。这些是常规的NumPy数组,因此您可以使用标准的NumPy函数将它们转储到磁盘。
Linear classifiers are easy: they have a coef_
and an intercept_
, described in the class docstrings. Those are regular NumPy arrays, so you can dump them to disk with standard NumPy functions.
>>> from sklearn.datasets import load_iris
>>> iris = load_iris()
>>> from sklearn.svm import LinearSVC
>>> clf = LinearSVC().fit(iris.data, iris.target)
现在让我们将其转储到伪文件:
Now let's dump this to a pseudo-file:
>>> from io import BytesIO
>>> outfile = BytesIO()
>>> np.savetxt(outfile, clf.coef_)
>>> print(outfile.getvalue())
1.842426121444650788e-01 4.512319840786759295e-01 -8.079381916413134190e-01 -4.507115611351246720e-01
5.201335313639676022e-02 -8.941985347763323766e-01 4.052446671573840531e-01 -9.380586070674181709e-01
-8.506908158338851722e-01 -9.867329247779884627e-01 1.380997337625912147e+00 1.865393234038096981e+00
这是你可以用Java解析的,对吧?
That's something you can parse from Java, right?
现在得到 k
的得分对于样本 x
的课程,您需要评估
Now to get a score for the k
'th class on a sample x
, you need to evaluate
np.dot(x, clf.coef_[k]) + clf.intercept_[k]
# ==
(sum(x[i] * clf.coef_[k, i] for i in xrange(clf.coef_.shape[1]))
+ clf.intercept_[k])
我希望,这也是可行的。分数最高的类获胜。
which is also doable, I hope. The class with the highest score wins.
对于内核SVM,情况更复杂,因为您需要复制一对一决策函数,以及Java代码中的内核。 SVM模型存储在属性 support_vectors _
和 dual_coef _ $ c中的
SVC
对象中$ c>。
For kernel SVMs, the situation is more complicated because you need to replicate the one-vs-one decision function, as well as the kernels, in the Java code. The SVM model is stored on SVC
objects in the attributes support_vectors_
and dual_coef_
.
这篇关于将SVM分类器从sklearn导出到Java代码库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!