来自e1071 R软件包的SVM方程式? [英] SVM equations from e1071 R package?
问题描述
我有兴趣测试SVM的性能,以将几个人分为四个组/类.当使用MATLAB的svmtrain LibSVM函数时,我能够基于该方程的值获得用于将4个组中的这些个体分类的三个方程.方案如下:
I am interested in test the SVM performance to classify several individuals into four groups/classes. When using the svmtrain LibSVM function from MATLAB, I am able to get the three equations used to classify those individuals among the 4 groups, based on the values of this equation. An scheme could be as follows:
All individuals (N)*
|
Group 1 (n1) <--- equation 1 ---> (N-n1)
|
(N-n1-n2) <--- equation 2 ---> Group 2 (n2)
|
Group 3 (n3) <--- equation 3 ---> Group 4(n4)
*N = n1+n2+n3+n4
有什么方法可以使用e1071 R软件包中的svm函数来获得这些方程式?
Is there any way to get these equations using the svm function in the e1071 R package?
推荐答案
svm
使用一对一"策略进行多类分类(即所有对之间的二进制分类,然后进行投票).因此,要处理此分层设置,您可能需要手动执行一系列二进制分类器,例如组1与所有,然后组2与剩余的任何东西,等等.此外,基本的svm
函数不会调整超参数,因此通常需要使用包装,如e1071
中的tune
或出色的caret
软件包中的train
.
svm
in e1071
uses the "one-against-one" strategy for multiclass classification (i.e. binary classification between all pairs, followed by voting). So to handle this hierarchical setup, you probably need to do a series of binary classifiers manually, like group 1 vs. all, then group 2 vs. whatever is left, etc.. Additionally, the basic svm
function does not tune the hyperparameters, so you will typically want to use a wrapper like tune
in e1071
, or train
in the excellent caret
package.
无论如何,要在R中对新个体进行分类,您不必手动将数字插入方程式中.而是使用predict
通用函数,该函数具有用于不同模型(例如SVM)的方法.对于这样的模型对象,通常也可以使用通用函数plot
和summary
.这是使用线性SVM的基本概念的示例:
Anyway, to classify new individuals in R, you don't have to plug numbers into an equation manually. Rather, you use the predict
generic function, which has methods for different models like SVM. For model objects like this, you can also usually use the generic functions plot
and summary
. Here is an example of the basic idea using a linear SVM:
require(e1071)
# Subset the iris dataset to only 2 labels and 2 features
iris.part = subset(iris, Species != 'setosa')
iris.part$Species = factor(iris.part$Species)
iris.part = iris.part[, c(1,2,5)]
# Fit svm model
fit = svm(Species ~ ., data=iris.part, type='C-classification', kernel='linear')
# Make a plot of the model
dev.new(width=5, height=5)
plot(fit, iris.part)
# Tabulate actual labels vs. fitted labels
pred = predict(fit, iris.part)
table(Actual=iris.part$Species, Fitted=pred)
# Obtain feature weights
w = t(fit$coefs) %*% fit$SV
# Calculate decision values manually
iris.scaled = scale(iris.part[,-3], fit$x.scale[[1]], fit$x.scale[[2]])
t(w %*% t(as.matrix(iris.scaled))) - fit$rho
# Should equal...
fit$decision.values
查看实际的类别标签与模型预测:
Tabulate actual class labels vs. model predictions:
> table(Actual=iris.part$Species, Fitted=pred)
Fitted
Actual versicolor virginica
versicolor 38 12
virginica 15 35
从svm
模型对象中提取特征权重(用于特征选择等).在这里,Sepal.Length
显然更有用.
Extract feature weights from svm
model object (for feature selection, etc.). Here, Sepal.Length
is obviously more useful.
> t(fit$coefs) %*% fit$SV
Sepal.Length Sepal.Width
[1,] -1.060146 -0.2664518
要了解决策值从何而来,我们可以手动将其计算为特征权重与预处理的特征向量的点积,减去截距偏移量rho
. (预处理意味着如果使用RBF SVM等,可能会进行居中/缩放和/或内核转换.)
To understand where the decision values come from, we can calculate them manually as the dot product of the feature weights and the preprocessed feature vectors, minus the intercept offset rho
. (Preprocessed means possibly centered/scaled and/or kernel transformed if using RBF SVM, etc.)
> t(w %*% t(as.matrix(iris.scaled))) - fit$rho
[,1]
51 -1.3997066
52 -0.4402254
53 -1.1596819
54 1.7199970
55 -0.2796942
56 0.9996141
...
这应等于内部计算的值:
This should equal what is calculated internally:
> head(fit$decision.values)
versicolor/virginica
51 -1.3997066
52 -0.4402254
53 -1.1596819
54 1.7199970
55 -0.2796942
56 0.9996141
...
这篇关于来自e1071 R软件包的SVM方程式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!