是否可以检索由混淆矩阵标识的误报/误报? [英] Is it possible to retrieve False Positives/ False Negatives identified by a confusion Matrix?

查看:135
本文介绍了是否可以检索由混淆矩阵标识的误报/误报?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Sckit学习,并且正在使用混淆矩阵来更深入地了解我的算法的性能:

I am using Sckit-learn and am using a Confusion Matrix to get more insight into how my algorithm is performing:

X_train, X_test, Y_train, Y_test = train_test_split(keywords_list, 

label_list, test_size=0.33, random_state=42)

pipeline.fit(X_train, Y_train)

pred = pipeline.predict(X_test)

print(confusion_matrix(Y_test, pred))

我得到这样的输出:

[[1011   72]
[ 154 1380]]

我认为这些矩阵的格式如下:

Which I assume follows the format for these Matrixes:

TP|FP
FN|TN

是否可以检索被分类为误报和误报的值?了解这些数据的外观将对我的工作有所帮助.不用说我是Sckit-Learn的新手.

Is it possible to retrieve the values that are being classified as false positives and False Negatives? Knowing what that data looks like would be helpful towards my work. it goes without saying I am new to Sckit-Learn.

亚历山德罗(Alessandro)告诉我Y_test != pred将在混淆矩阵中返回我所有的假阳性/阴性结果,从而给出了很好的建议.

Alessandro gave good advice by informing me that Y_test != pred would return all of my false positives/negatives in the confusion matrix.

我应该在最初的问题中提到的一个因素是,我正在将文本数据归类为二进制标签. (例如火腿/垃圾邮件),我想将它们彼此分开识别.我当前用于提取假阴性的代码采用以下形式:

One factor that I should have mentioned in my original question is that I am classifying textual data under binary labels. (E.g. Ham/Spam) and I want to identify them seperately from each other. My current code for extracting false negatives is taking the form of:

false_neg = open('false_neg.csv', 'w')
falsen_list = X_test[(Y_test == 'Spam') and (pred == 'Ham')] #False Negatives
wr2 = csv.writer(false_neg, quoting=csv.QUOTE_ALL)
for x in falsen_list:
    wr2.writerow([x])

不幸的是,这引发了错误:

Unfortunately, this throws an error:

  Traceback (most recent call last):
  File "/home/noname365/PycharmProjects/MLCorpusBlacklist/CorpusML_training.py", line 171, in <module>
    falsen_list = X_test[(Y_test == 'blacklisted') and (pred == 'clean')] #False Negatives
  File "/home/noname365/virtualenvs/env35/lib/python3.5/site-packages/pandas/core/generic.py", line 731, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我在这里吗?

推荐答案

对我来说,这可以添加'&'在亚历山德罗答案中的"=="位置(他的答案同时给出了误报和误报)

For me this worked adding '&' at the place of '==' in Alessandro's answer(His answer gave both false positives and false negatives together)

(Y_test == 1)& (pred == 0)

(Y_test == 1) & (pred == 0)

希望有帮助.

这篇关于是否可以检索由混淆矩阵标识的误报/误报?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆