scikit learn 中的特异性 [英] Specificity in scikit learn

查看:120
本文介绍了scikit learn 中的特异性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的分类需要 specificity 定义为:TN/(TN+FP)

I need specificity for my classification which is defined as : TN/(TN+FP)

我正在编写自定义记分器功能:

I am writing a custom scorer function :

from sklearn.metrics import make_scorer
def specificity_loss_func(ground_truth, predictions):
    print predictions
    tp, tn, fn, fp = 0.0,0.0,0.0,0.0
    for l,m in enumerate(ground_truth):        
        if m==predictions[l] and m==1:
            tp+=1
        if m==predictions[l] and m==0:
            tn+=1
        if m!=predictions[l] and m==1:
            fn+=1
        if m!=predictions[l] and m==0:
            fp+=1
    `return tn/(tn+fp)

score = make_scorer(specificity_loss_func, greater_is_better=True)

那么,

from sklearn.dummy import DummyClassifier
clf_dummy = DummyClassifier(strategy='most_frequent', random_state=0)
ground_truth = [0,0,1,0,1,1,1,0,0,1,0,0,1]
p  = [0,0,0,1,0,1,1,1,1,0,0,1,0]
clf_dummy = clf_dummy.fit(ground_truth, p)
score(clf_dummy, ground_truth, p)

当我运行这些命令时,我将 p 打印为:

When I run these commands, I get p printed as :

[0 0 0 0 0 0 0 0 0 0 0 0 0]
1.0

为什么当我输入 p = [0,0,0,1,0,1,1,1,1,0,0 时,我的 p 会变成一系列零,1,0]

Why is my p changing to a series of zeros when I input p = [0,0,0,1,0,1,1,1,1,0,0,1,0]

推荐答案

首先你需要知道:

DummyClassifier(strategy='most_frequent'...

会给你分类器,它从你的训练集中返回最频繁的标签.它甚至不考虑 X 中的样本.您可以在此行中传递任何内容而不是 ground_truth:

Will give you classifier which returns most frequent label from your training set. It doesn't even take into consideration samples in X. You can pass anything instead of ground_truth in this line:

clf_dummy = clf_dummy.fit(ground_truth, p)

训练结果和预测将保持不变,因为 p 中的大多数标签都是标签0".

result of training, and predictions will stay same, because majority of labels inside p is label "0".

第二个你需要知道的事情:make_scorer 返回带有接口的函数 scorer(estimator, X, y) 该函数将在集合 X 上调用 estimator 的 predict 方法,并计算您在预测标签和 y 之间的特异性函数.

Second thing that you need to know: make_scorer returns function with interface scorer(estimator, X, y) This function will call predict method of estimator on set X, and calculates your specificity function between predicted labels and y.

所以它在任何数据集上调用 clf_dummy(不管是哪一个,它总是返回 0),并返回 0 的向量,然后它计算 ground_truth 和预测之间的特异性损失.您的预测为 0,因为 0 是训练集中的多数类.您的分数等于 1,因为没有误报预测.

So it calls clf_dummy on any dataset (doesn't matter which one, it will always return 0), and returns vector of 0's, then it computes specificity loss between ground_truth and predictions. Your predictions is 0 because 0 was majority class in training set. Your score is equals 1 because there is no false positive predictions.

我更正了您的代码,以增加便利性.

I corrected your code, to add more convenience.

from sklearn.dummy import DummyClassifier
clf_dummy = DummyClassifier(strategy='most_frequent', random_state=0)
X = [[0],[0],[1],[0],[1],[1],[1],[0],[0],[1],[0],[0],[1]]
p  = [0,0,0,1,0,1,1,1,1,0,0,1,0]
clf_dummy = clf_dummy.fit(X, p)
score(clf_dummy, X, p)

这篇关于scikit learn 中的特异性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆