sklearn 中 dual_coef_ 的维度.SVC [英] The dimension of dual_coef_ in sklearn. SVC

查看:112
本文介绍了sklearn 中 dual_coef_ 的维度.SVC的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在用于多分类的SVC() 中,训练一对一分类器.所以总共应该有 n_class * (n_class - 1)/2 个分类器.但是为什么 clf.dual_coef_ 只返回我 (n_class - 1) * n_SV?那么每一行代表什么?

In SVC() for multi-classification, the one-vs-one classifiers are trained. So there are supposed to be n_class * (n_class - 1)/2 classifiers in total. But why clf.dual_coef_ returns me only (n_class - 1) * n_SV? What does each row represent then?

推荐答案

在多类设置中 sklearn.svm.SVC 的对偶系数很难解释.scikit-learn 文档中有解释.sklearn.svm.SVC 使用 libsvm 进行计算并采用相同的双系数的数据结构.对这些系数的组织的另一种解释是 FAQ.对于您在拟合 SVC 分类器中找到的系数,解释如下:

The dual coefficients of a sklearn.svm.SVC in the multiclass setting are tricky to interpret. There is an explanation in the scikit-learn documentation. The sklearn.svm.SVC uses libsvm for the calculations and adopts the same data structure for the dual coefficients. Another explanation of the organization of these coefficients is in the FAQ. In the case of the coefficients you find in the fitted SVC classifier, interpretation goes as follows:

SVC 识别的支持向量每个都属于某个类.在对偶系数中,它们根据它们所属的类进行排序.给定一个拟合的 SVC 估计量,例如

The support vectors identified by the SVC each belong to a certain class. In the dual coefficients, they are ordered according to the class they belong to. Given a fitted SVC estimator, e.g.

from sklearn.svm import SVC
svc = SVC()
svc.fit(X, y)

你会发现

svc.classes_   # represents the unique classes
svc.n_support_ # represents the number of support vectors per class

支持向量根据这两个变量进行组织.每个支持向量都被清楚地标识为一个类,很明显它可以隐含在最多 n_classes-1 个一对一问题中,即与所有其他类的每次比较.但是完全有可能在所有一对一问题中都不会隐含给定的支持向量.

The support vectors are organized according to these two variables. Each support vector being clearly identified with one class, it becomes evident that it can be implied in at most n_classes-1 one-vs-one problems, viz every comparison with all the other classes. But it is entirely possible that a given support vector will not be implied in all one-vs-one problems.

看看

support_indices = np.cumsum(svc.n_support_)
svc.dual_coef_[0:support_indices[0]]  # < ---
                                      # weights on support vectors of class 0
                                      # for problems 0v1, 0v2, ..., 0v(n-1)
                                      # so n-1 columns for each of the 
                                      # svc.n_support_[0] support vectors
svc.dual_coef_[support_indices[1]:support_indices[2]]  
                                      #  ^^^
                                      # weights on support vectors of class 1
                                      # for problems 0v1, 1v2, ..., 1v(n-1)
                                      # so n-1 columns for each of the 
                                      # svc.n_support_[1] support vectors
...
svc.dual_coef_[support_indices[n_classes - 2]:support_indices[n_classes - 1]]
                                      #  ^^^
                                      # weights on support vectors of class n-1
                                      # for problems 0vs(n-1), 1vs(n-1), ..., (n-2)v(n-1)
                                      # so n-1 columns for each of the 
                                      # svc.n_support_[-1] support vectors

为您提供 0、1、...、n-1 类在各自的一对一问题中的支持向量的权重.与除了它自己的所有其他类进行比较,结果是 n_classes - 1 列.发生这种情况的顺序遵循上面公开的唯一类的顺序.每组中的行数与支持向量数一样多.

gives you the weights of the support vectors for the classes 0, 1, ..., n-1 in their respective one-vs-one problems. Comparisons to all other classes except its own are made, resulting in n_classes - 1 columns. The order in which this happens follows the order of the unique classes exposed above. There are as many rows in each group as there are support vectors.

可能您正在寻找的是原始权重,它们存在于特征空间中,以便检查它们对分类的重要性".这仅适用于线性内核.试试这个

Possibly what you are looking for are the primal weights, which live in feature space, in order to inspect them as to their "importance" for classification. This is only possible with a linear kernel. Try this

from sklearn.svm import SVC
svc = SVC(kernel="linear")
svc.fit(X, y)  # X is your data, y your labels

接着看

svc.coef_

这是一个形状数组 ((n_class * (n_class -1)/2), n_features) 并表示上述权重.

This is an array of shape ((n_class * (n_class -1) / 2), n_features) and represents the aforementioned weights.

根据 doc 对权重进行排序如:

According to the doc the weights are ordered as:

class 0 vs class 1
class 0 vs class 2
...
class 0 vs class n-1
class 1 vs class 2
class 1 vs class 3
...
...
class n-2 vs class n-1

这篇关于sklearn 中 dual_coef_ 的维度.SVC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆