如何修复线性 SVM 的误报率? [英] How to fix the false positives rate of a linear SVM?

查看:40
本文介绍了如何修复线性 SVM 的误报率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 SVM 新手,这是我的用例:我有很多不平衡数据要使用线性 SVM 进行二进制分类.我需要将误报率固定在某些值并测量每个值的相应误报.我正在使用类似于以下代码的东西,利用 scikit-learn svm 实现:

I am an SVM newbie and this is my use case: I have a lot of unbalanced data to be binary classified using a linear SVM. I need to fix the false positives rate at certain values and measure the corresponding false negatives for each value. I am using something like the following code making use of scikit-learn svm implementation:

# define training data
X = [[0, 0], [1, 1]]
y = [0, 1]

# define and train the SVM
clf = svm.LinearSVC(C=0.01, class_weight='auto') #auto for unbalanced distributions
clf.fit(X, y)

# compute false positives and false negatives
predictions = [clf.predict(ex) for ex in X]    
false_positives = [(a, b) for (a, b) in zip(predictions,y) if a != b and b == 0]
false_negatives = [(a, b) for (a, b) in zip(predictions,y) if a != b and b == 1] 

有没有办法使用分类器的一个参数(或几个参数)来有效地固定测量指标?

Is there a way to play with a parameter (or a few parameters) of the classifier such that one the measurement metrics is effectively fixed?

推荐答案

sklearn中LinearSVC的predict方法是这样的

The predict method for LinearSVC in sklearn looks like this

def predict(self, X):
    """Predict class labels for samples in X.

    Parameters
    ----------
    X : {array-like, sparse matrix}, shape = [n_samples, n_features]
        Samples.

    Returns
    -------
    C : array, shape = [n_samples]
        Predicted class label per sample.
    """
    scores = self.decision_function(X)
    if len(scores.shape) == 1:
        indices = (scores > 0).astype(np.int)
    else:
        indices = scores.argmax(axis=1)
    return self.classes_[indices]

因此,除了 mbatchkarov 建议的内容之外,您还可以通过更改分类器表示某事物属于一类或另一类的边界来更改分类器(实际上是任何分类器)做出的决定.

So in addition to what mbatchkarov suggested you can change the decisions made by the classifier (any classifier really) by changing the boundary at which the classifier says something is of one class or the other.

from collections import Counter
import numpy as np
from sklearn.datasets import load_iris
from sklearn.svm import LinearSVC

data = load_iris()

# remove a feature to make the problem harder
# remove the third class for simplicity
X = data.data[:100, 0:1] 
y = data.target[:100] 
# shuffle data
indices = np.arange(y.shape[0])
np.random.shuffle(indices)
X = X[indices, :]
y = y[indices]

decision_boundary = 0
print Counter((clf.decision_function(X[50:]) > decision_boundary).astype(np.int8))
Counter({1: 27, 0: 23})

decision_boundary = 0.5
print Counter((clf.decision_function(X[50:]) > decision_boundary).astype(np.int8))
Counter({0: 39, 1: 11})

您可以根据自己的需要优化决策边界.

You can optimize the decision boundary to be anything depending on your needs.

这篇关于如何修复线性 SVM 的误报率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆