R和WEKA上的支持向量机 [英] Support Vector Machine on R and WEKA

查看:153
本文介绍了R和WEKA上的支持向量机的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据使用e1071软件包中的R在svm上生成了奇怪的结果,所以我尝试检查R svm是否可以生成与WEKA(或python)相同的结果,因为我过去一直在使用WEKA.

我搜索了一个问题,发现一个问题与我完全相同,但没有答案. 这是问题.

所以我希望我能在这里得到答案.

为了使事情变得简单,我还使用了虹膜数据集,并使用整个虹膜数据训练了一个模型(WEKA中的SMO和R软件包e1071中的svm),并对其进行了测试.

WEKA参数:

weka.classifiers.functions.SMO -C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V 10 -W 1 -K "weka.classifiers.functions.supportVector.RBFKernel -G 0.01 -C 250007"

除默认值外,我将内核更改为RBFKernel以使其与R功能保持一致.

结果是:

  a  b  c   <-- classified as
 50  0  0 |  a = Iris-setosa
  0 46  4 |  b = Iris-versicolor
  0  7 43 |  c = Iris-virginica

R脚本:

library(e1071)
model <- svm(iris[,-5], iris[,5], kernel="radial", epsilon=1.0E-12)
res <- predict(model, iris[,-5])
table(pred = res, true = iris[,ncol(iris)]) 

结果是:

            true
pred         setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         48         2
  virginica       0          2        48

我不是机器学习人员,所以我猜这两种方法的默认参数有很大不同.例如,e1071的默认值epsilon为0.01,而WEKA的值为1.0E-12.我试图通读手册,并希望使所有参数相同,但是许多参数似乎与我不具有可比性.

谢谢.

解决方案

请参阅 http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.html 以获得SMO的RWeka参数,并使用?svm查找e1071 svm的相应参数实施.

根据?svm,R e1071 svm是libsvm的接口,并且似乎使用标准的QP求解器.

对于具有k个级别(k> 2)的多类分类,libsvm使用 一对一"方法,其中k(k-1)/2个二进制分类器是 训练有素通过表决方案可以找到合适的类别. libsvm在内部使用稀疏数据表示形式,该稀疏数据表示形式也受到软件包SparseM的高层支持.

与之相反的是RWeka中的SMO

实现John C. Platt的顺序最小优化算法 用于使用多项式或RBF训练支持向量分类器 内核.使用成对解决多类问题 分类.

因此,这两种实现通常是不同的(因此结果可能会略有不同).仍然,如果我们选择相同的对应超参数,则混淆矩阵几乎相同:

library(RWeka)
model.smo <- SMO(Species ~ ., data = iris,
control = Weka_control(K = list("RBFKernel", G=2), C=1.0, L=0.001, P=1.0E-12, N=0, V=10, W=1234))
res.smo <- predict(model.smo, iris[,-5])
table(pred = res.smo, true = iris[,ncol(iris)]) 

             true
pred         setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         47         1
  virginica       0          3        49

library(e1071)
set.seed(1234)
model.svm <- svm(iris[,-5], iris[,5], kernel="radial", cost=1.0, tolerance=0.001, epsilon=1.0E-12, scale=TRUE, cross=10)
res.svm <- predict(model.svm, iris[,-5])
table(pred = res.svm, true = iris[,ncol(iris)])  

           true
pred         setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         49         1
  virginica       0          1        49

另请参阅:[ https://stats.stackexchange .com/questions/130293/svm-and-smo-main-differences] [1] 和此[This is the question.

So I hope that I could get an answer here.

To make things easier, I'm also using the iris data set, and train a model (SMO in WEKA, and svm from R package e1071) using the whole iris data, and test on itself.

WEKA parameters:

weka.classifiers.functions.SMO -C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V 10 -W 1 -K "weka.classifiers.functions.supportVector.RBFKernel -G 0.01 -C 250007"

Other than default, I changed kernel into RBFKernel to make it consistant with the R fucntion.

The result is:

  a  b  c   <-- classified as
 50  0  0 |  a = Iris-setosa
  0 46  4 |  b = Iris-versicolor
  0  7 43 |  c = Iris-virginica

R script:

library(e1071)
model <- svm(iris[,-5], iris[,5], kernel="radial", epsilon=1.0E-12)
res <- predict(model, iris[,-5])
table(pred = res, true = iris[,ncol(iris)]) 

The result is:

            true
pred         setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         48         2
  virginica       0          2        48

I'm not a machine learning person, so I'm guessing the default parameters are very different for these two methods. For example, e1071 has 0.01 as default epsilon and WEKA has 1.0E-12. I tried to read through the manuals and wanted to make all parameters identical, but a lot of parameters do not seem comparable to me.

Thanks.

解决方案

Refer to http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.html for the RWeka parameters for SMO and use ?svm to find the corresponding parameters for e1071 svm implementation.

As per ?svm, R e1071 svm is an interface to libsvm and seems to use standard QP solvers.

For multiclass-classification with k levels, k>2, libsvm uses the ‘one-against-one’-approach, in which k(k-1)/2 binary classifiers are trained; the appropriate class is found by a voting scheme. libsvm internally uses a sparse data representation, which is also high-level supported by the package SparseM.

To the contrary ?SMO in RWeka

implements John C. Platt's sequential minimal optimization algorithm for training a support vector classifier using polynomial or RBF kernels. Multi-class problems are solved using pairwise classification.

So, these two implementations are different in general (so the results may be a little different). Still if we choose the corresponding hyper-parameters same, the confusion matrix is almost the same:

library(RWeka)
model.smo <- SMO(Species ~ ., data = iris,
control = Weka_control(K = list("RBFKernel", G=2), C=1.0, L=0.001, P=1.0E-12, N=0, V=10, W=1234))
res.smo <- predict(model.smo, iris[,-5])
table(pred = res.smo, true = iris[,ncol(iris)]) 

             true
pred         setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         47         1
  virginica       0          3        49

library(e1071)
set.seed(1234)
model.svm <- svm(iris[,-5], iris[,5], kernel="radial", cost=1.0, tolerance=0.001, epsilon=1.0E-12, scale=TRUE, cross=10)
res.svm <- predict(model.svm, iris[,-5])
table(pred = res.svm, true = iris[,ncol(iris)])  

           true
pred         setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         49         1
  virginica       0          1        49

Also refer to this: [https://stats.stackexchange.com/questions/130293/svm-and-smo-main-differences][1] and this [https://www.quora.com/Whats-the-difference-between-LibSVM-and-LibLinear][1]

这篇关于R和WEKA上的支持向量机的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆