来自e1071 R软件包的SVM方程式? [英] SVM equations from e1071 R package?

查看:87
本文介绍了来自e1071 R软件包的SVM方程式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有兴趣测试SVM的性能,以将几个人分为四个组/类.当使用MATLAB的svmtrain LibSVM函数时,我能够基于该方程的值获得用于将4个组中的这些个体分类的三个方程.方案如下:

I am interested in test the SVM performance to classify several individuals into four groups/classes. When using the svmtrain LibSVM function from MATLAB, I am able to get the three equations used to classify those individuals among the 4 groups, based on the values of this equation. An scheme could be as follows:

                All individuals (N)*
                      |
 Group 1 (n1) <--- equation 1 --->  (N-n1)
                                      |
                   (N-n1-n2) <--- equation 2 ---> Group 2 (n2)
                      |
Group 3 (n3) <--- equation 3 ---> Group 4(n4)

*N = n1+n2+n3+n4

有什么方法可以使用e1071 R软件包中的svm函数来获得这些方程式?

Is there any way to get these equations using the svm function in the e1071 R package?

推荐答案

svm使用一对一"策略进行多类分类(即所有对之间的二进制分类,然后进行投票).因此,要处理此分层设置,您可能需要手动执行一系列二进制分类器,例如组1与所有,然后组2与剩余的任何东西,等等.此外,基本的svm函数不会调整超参数,因此通常需要使用包装,如e1071中的tune或出色的caret软件包中的train.

svm in e1071 uses the "one-against-one" strategy for multiclass classification (i.e. binary classification between all pairs, followed by voting). So to handle this hierarchical setup, you probably need to do a series of binary classifiers manually, like group 1 vs. all, then group 2 vs. whatever is left, etc.. Additionally, the basic svm function does not tune the hyperparameters, so you will typically want to use a wrapper like tune in e1071, or train in the excellent caret package.

无论如何,要在R中对新个体进行分类,您不必手动将数字插入方程式中.而是使用predict通用函数,该函数具有用于不同模型(例如SVM)的方法.对于这样的模型对象,通常也可以使用通用函数plotsummary.这是使用线性SVM的基本概念的示例:

Anyway, to classify new individuals in R, you don't have to plug numbers into an equation manually. Rather, you use the predict generic function, which has methods for different models like SVM. For model objects like this, you can also usually use the generic functions plot and summary. Here is an example of the basic idea using a linear SVM:

require(e1071)

# Subset the iris dataset to only 2 labels and 2 features
iris.part = subset(iris, Species != 'setosa')
iris.part$Species = factor(iris.part$Species)
iris.part = iris.part[, c(1,2,5)]

# Fit svm model
fit = svm(Species ~ ., data=iris.part, type='C-classification', kernel='linear')

# Make a plot of the model
dev.new(width=5, height=5)
plot(fit, iris.part)

# Tabulate actual labels vs. fitted labels
pred = predict(fit, iris.part)
table(Actual=iris.part$Species, Fitted=pred)

# Obtain feature weights
w = t(fit$coefs) %*% fit$SV

# Calculate decision values manually
iris.scaled = scale(iris.part[,-3], fit$x.scale[[1]], fit$x.scale[[2]]) 
t(w %*% t(as.matrix(iris.scaled))) - fit$rho

# Should equal...
fit$decision.values

查看实际的类别标签与模型预测:

Tabulate actual class labels vs. model predictions:

> table(Actual=iris.part$Species, Fitted=pred)
            Fitted
Actual       versicolor virginica
  versicolor         38        12
  virginica          15        35

svm模型对象中提取特征权重(用于特征选择等).在这里,Sepal.Length显然更有用.

Extract feature weights from svm model object (for feature selection, etc.). Here, Sepal.Length is obviously more useful.

> t(fit$coefs) %*% fit$SV
     Sepal.Length Sepal.Width
[1,]    -1.060146  -0.2664518

要了解决策值从何而来,我们可以手动将其计算为特征权重与预处理的特征向量的点积,减去截距偏移量rho. (预处理意味着如果使用RBF SVM等,可能会进行居中/缩放和/或内核转换.)

To understand where the decision values come from, we can calculate them manually as the dot product of the feature weights and the preprocessed feature vectors, minus the intercept offset rho. (Preprocessed means possibly centered/scaled and/or kernel transformed if using RBF SVM, etc.)

> t(w %*% t(as.matrix(iris.scaled))) - fit$rho
         [,1]
51 -1.3997066
52 -0.4402254
53 -1.1596819
54  1.7199970
55 -0.2796942
56  0.9996141
...

这应等于内部计算的值:

This should equal what is calculated internally:

> head(fit$decision.values)
   versicolor/virginica
51           -1.3997066
52           -0.4402254
53           -1.1596819
54            1.7199970
55           -0.2796942
56            0.9996141
...

这篇关于来自e1071 R软件包的SVM方程式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆