Scikit-learn:如何获得真阳性、真阴性、假阳性和假阴性 [英] Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative

查看:75
本文介绍了Scikit-learn:如何获得真阳性、真阴性、假阳性和假阴性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题:

我有一个数据集,它是一个大型 JSON 文件.我读取它并将其存储在 trainList 变量中.

I have a dataset which is a large JSON file. I read it and store it in the trainList variable.

接下来,我对其进行预处理 - 为了能够使用它.

Next, I pre-process it - in order to be able to work with it.

完成后我开始分类:

  1. 我使用 kfold 交叉验证方法来获得均值准确率并训练分类器.
  2. 我进行预测并获得准确度&该折叠的混淆矩阵.
  3. 在此之后,我想获得True Positive(TP)True Negative(TN)False Positive(FP)False Negative(FN) 值.我将使用这些参数来获得灵敏度特异性.
  1. I use the kfold cross validation method in order to obtain the mean accuracy and train a classifier.
  2. I make the predictions and obtain the accuracy & confusion matrix of that fold.
  3. After this, I would like to obtain the True Positive(TP), True Negative(TN), False Positive(FP) and False Negative(FN) values. I'll use these parameters to obtain the Sensitivity and Specificity.

最后,我会用它来放入 HTML 以显示带有每个标签的 TP 的图表.

Finally, I would use this to put in HTML in order to show a chart with the TPs of each label.

代码:

我目前拥有的变量:

trainList #It is a list with all the data of my dataset in JSON form
labelList #It is a list with all the labels of my data 

方法的大部分:

#I transform the data from JSON form to a numerical one
X=vec.fit_transform(trainList)

#I scale the matrix (don't know why but without it, it makes an error)
X=preprocessing.scale(X.toarray())

#I generate a KFold in order to make cross validation
kf = KFold(len(X), n_folds=10, indices=True, shuffle=True, random_state=1)

#I start the cross validation
for train_indices, test_indices in kf:
    X_train=[X[ii] for ii in train_indices]
    X_test=[X[ii] for ii in test_indices]
    y_train=[listaLabels[ii] for ii in train_indices]
    y_test=[listaLabels[ii] for ii in test_indices]

    #I train the classifier
    trained=qda.fit(X_train,y_train)

    #I make the predictions
    predicted=qda.predict(X_test)

    #I obtain the accuracy of this fold
    ac=accuracy_score(predicted,y_test)

    #I obtain the confusion matrix
    cm=confusion_matrix(y_test, predicted)

    #I should calculate the TP,TN, FP and FN 
    #I don't know how to continue

推荐答案

如果您有两个列表,分别具有预测值和实际值;正如您所做的那样,您可以将它们传递给一个函数,该函数将使用以下内容计算 TP、FP、TN、FN:

If you have two lists that have the predicted and actual values; as it appears you do, you can pass them to a function that will calculate TP, FP, TN, FN with something like this:

def perf_measure(y_actual, y_hat):
    TP = 0
    FP = 0
    TN = 0
    FN = 0

    for i in range(len(y_hat)): 
        if y_actual[i]==y_hat[i]==1:
           TP += 1
        if y_hat[i]==1 and y_actual[i]!=y_hat[i]:
           FP += 1
        if y_actual[i]==y_hat[i]==0:
           TN += 1
        if y_hat[i]==0 and y_actual[i]!=y_hat[i]:
           FN += 1

    return(TP, FP, TN, FN)

从这里我认为您将能够计算出您的利率,以及其他性能指标,如特异性和敏感性.

From here I think you will be able to calculate rates of interest to you, and other performance measure like specificity and sensitivity.

这篇关于Scikit-learn:如何获得真阳性、真阴性、假阳性和假阴性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆