负的Decision_Function值 [英] Negative decision_function values

查看:108
本文介绍了负的Decision_Function值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Iris数据集上来自sklearn的支持向量分类器.当我调用decision_function时,它将返回负值.但是分类后的测试数据集中的所有样本都具有正确的分类.我认为,当样本为离群值时,decision_function应该返回正值;如果样本为离群值,则decision_function应该返回负值.我哪里错了?

I am using support vector classifier from sklearn on the Iris dataset. When I call decision_function it returns negative values. But all samples in test dataset after classification has right class. I think that decision_function should return the positive value when the sample is an inlier and negative if the sample is an outlier. Where I am wrong?

from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X = iris.data[:,:]
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, 
random_state=0)

clf = SVC(probability=True)
print(clf.fit(X_train,y_train).decision_function(X_test))
print(clf.predict(X_test))
print(y_test)

以下是输出:

[[-0.76231668 -1.03439531 -1.40331645]
 [-1.18273287 -0.64851109  1.50296097]
 [ 1.10803774  1.05572833  0.12956269]
 [-0.47070432 -1.08920859 -1.4647051 ]
 [ 1.18767563  1.12670665  0.21993744]
 [-0.48277866 -0.98796232 -1.83186272]
 [ 1.25020033  1.13721691  0.15514536]
 [-1.07351583 -0.84997114  0.82303659]
 [-1.04709616 -0.85739411  0.64601611]
 [-1.23148923 -0.69072989  1.67459938]
 [-0.77524787 -1.00939817 -1.08441968]
 [-1.12212245 -0.82394879  1.11615504]
 [-1.14646662 -0.91238712  0.80454974]
 [-1.13632316 -0.8812114   0.80171542]
 [-1.14881866 -0.95169643  0.61906248]
 [ 1.15821271  1.10902205  0.22195304]
 [-1.19311709 -0.93149873  0.78649126]
 [-1.21653084 -0.90953622  0.78904491]
 [ 1.16829526  1.12102515  0.20604678]
 [ 1.18446364  1.1080255   0.15199149]
 [-0.93911991 -1.08150089 -0.8026332 ]
 [-1.15462733 -0.95603159  0.5713605 ]
 [ 0.93278883  0.99763184  0.34033663]
 [ 1.10999556  1.04596018  0.14791409]
 [-1.07285663 -1.01864255 -0.10701465]
 [ 1.21200422  1.01284263  0.0416991 ]
 [ 0.9462457   1.01076579  0.36620915]
 [-1.2108146  -0.79124775  1.43264808]
 [-1.02747495 -0.25741977  1.13056021]
...
 [ 1.16066886  1.11212424  0.22506538]]
 [2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
 2 1 1 2 0 2 0 0]

 [2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
 1 1 1 2 0 2 0 0]

推荐答案

您需要分别考虑决策功能和预测.决定是从超平面到样品的距离.这意味着通过查看符号,您可以判断出样本位于超平面的右侧还是左侧.因此,负值可以很好地表示负类(超平面的另一侧").

You need to consider the decision_function and the prediction separately. The decision is the distance from the hyperplane to your sample. This means by looking at the sign you can tell if your sample is located right or left to the hyperplane. So negative values are perfectly fine and indicate the negative class ("the other side of the hyperplane").

使用虹膜数据集,您遇到了多类问题.由于SVM是二进制分类器,因此没有固有的多类分类.两种方法是"one-vs-rest"(OvR)和"one-vs-one"方法,它们从二进制单位"构造多类分类器.

With the iris dataset you have a multi-class problem. As the SVM is binary classifier, there is no inherent multi-class classification. Two approaches are the "one-vs-rest" (OvR) and "one-vs-one" methods, which construct a multi-class classifier from the binary "units".

现在您已经了解了OvR,OvA变得不那么难掌握了.基本上,您可以构造每个类对组合的分类器(A, B).您的情况是:0 vs 1,0 vs 2,1 vs2.

Now that you already know OvR, OvA is not that much harder to grasp. You basically construct a classifier of every combination of class pairs (A, B). In your case: 0 vs 1, 0 vs 2, 1 vs 2.

注意:(A,B)和(B,A)的值可以从单个二进制分类器获得.您只需更改被认为是肯定类的内容,因此必须将符号反转.

Note: The values of (A, B) and (B, A) can be obtained from a single binary classifier. You only change what is considered the positive class and thus you have to invert the sign.

这样做可以为您提供一个矩阵:

Doing this gives you a matrix:

+-------+------+-------+-------+
| A / B |  #0  |   #1  |   #2  |
+-------+------+-------+-------+
|       |      |       |       |
| #0    |  --  | -1.18 | -0.64 |
|       |      |       |       |
| #1    | 1.18 |  --   |  1.50 |
|       |      |       |       |
| #2    | 0.64 | -1.50 |  --   |
+-------+------+-------+-------+

阅读以下内容: A类(行)与B类(列)竞争时的决策函数值.

Read this as following: Decision function value when class A (row) competes against class B (column).

为了提取结果,需要进行投票.在基本形式中,您可以将其想象为每个分类器都可以投一次票:是或否.这可能导致平局,因此我们改用整个决策函数值.

In order to extract a result a vote is performed. In the basic form you can imagine this as a single vote that each classifier can give: Yes or No. This could lead to draws, so we use the whole decision function values instead.

+-------+------+-------+-------+-------+
| A / B |  #0  |   #1  |   #2  |  SUM  |
+-------+------+-------+-------+-------+
|       |      |       |       |       |
| #0    | -    | -1.18 | -0.64 | -1.82 |
|       |      |       |       |       |
| #1    | 1.18 | -     | 1.50  | 2.68  |
|       |      |       |       |       |
| #2    | 0.64 | -1.50 | -     | 0.86  |
+-------+------+-------+-------+-------+

结果列再次为您提供向量[-1.82, 2.68, 0.86].现在应用arg max,它与您的预测相符.

The resulting columns gives you again a vector [-1.82, 2.68, 0.86]. Now apply arg max and it matches your prediction.

我保留此部分以避免进一步的混乱. scikit-lear SVC 分类器(libsvm)有一个decision_function_shape参数,这使我误以为它是OvR(我大部分时间都在使用liblinear).

I keep this section to avoid further confusion. The scikit-lear SVC classifier (libsvm) has a decision_function_shape parameter, which deceived me into thinking it was OvR (i am using liblinear most of the time).

对于真正的OvR响应,您可以从每个分类器的决策函数中获得一个值,例如

For a real OvR respone you get one value from the decision function per classifier, e.g.

 [-1.18273287 -0.64851109  1.50296097]

现在要从中获取预测,您只需应用arg max,它将返回值1.50296097的最后一个索引.从这里开始,不再需要决策函数的值(对于此单个预测).这就是为什么您注意到您的预测很好的原因.

Now to obtain a prediction from this you could just apply arg max, which would return the last index with a value of 1.50296097. From here on the decision function's value is not needed anymore (for this single prediction). That's why you noticed that your predictions are fine.

但是您还指定了probability=True,它使用distance_function的值并将其传递给 Sigmoid函数.如上的采样原理,但是现在您也有了0到1之间的置信度值(我更喜欢这个术语,而不是概率,因为它仅描述到超平面的距离).

However you also specified probability=True, which uses the value of the distance_function and passes it to a sigmoid function. Sample principle as above, but now you also have confidence values (i prefer this term over probabilities, since it only describes the distance to the hyperplane) between 0 and 1.

修改: 糟糕,sascha是正确的. LibSVM使用一对一(尽管决策函数的形状).

Oops, sascha is right. LibSVM uses one-vs-one (despite the shape of the decision function).

这篇关于负的Decision_Function值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆