Sklearn SVM - 如何获得错误预测的列表? [英] Sklearn SVM - how to get a list of the wrong predictions?

查看:111
本文介绍了Sklearn SVM - 如何获得错误预测的列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不是专家用户.我知道我可以得到混淆矩阵,但我想得到一个被错误分类的行的列表,以便在分类后研究它们.

I am not an expert user. I know that I can obtain the confusion matrix, but I would like to obtain a list of the rows that have been classified in a wrong way in order to study them after classification.

在stackoverflow上我发现了这个我可以在 scikit-learn 中获得 SVM 评分函数中错误预测的列表吗 但我不确定是否理解了所有内容.

On stackoverflow I found this Can I get a list of wrong predictions in SVM score function in scikit-learn but I am not sure to have understood everything.

这是一个示例代码.

# importing necessary libraries
from sklearn import datasets
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split

# loading the iris dataset
iris = datasets.load_iris()

# X -> features, y -> label
X = iris.data
y = iris.target

# dividing X, y into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

# training a linear SVM classifier
from sklearn.svm import SVC
svm_model_linear = SVC(kernel = 'linear', C = 1).fit(X_train, y_train)
svm_predictions = svm_model_linear.predict(X_test)

# model accuracy for X_test  
accuracy = svm_model_linear.score(X_test, y_test)

# creating a confusion matrix
cm = confusion_matrix(y_test, svm_predictions)

要遍历行并找到错误的行,建议的解决方案是:

To iterate through the rows and to find the wrong ones, the proposed solution is:

predictions = clf.predict(inputs)
for input, prediction, label in zip(inputs, predictions, labels):
  if prediction != label:
    print(input, 'has been classified as ', prediction, 'and should be ', label) 

我不明白什么是输入"/输入".如果我将此代码改编为我的代码,如下所示:

I didn't understand what is "input"/"inputs". If I adapt this code to my code, like this:

for input, prediction, label in zip (X_test, svm_predictions, y_test):
  if prediction != label:
    print(input, 'has been classified as ', prediction, 'and should be ', label)

我得到:

[6.  2.7 5.1 1.6] has been classified as  2 and should be  1

第 6 行是错误的行吗?6.后面的数字是什么?我问这个是因为我在比这个更大的数据集上使用相同的代码,所以我想确保我做的是正确的事情.我没有发布其他数据集,因为不幸的是我不能,但问题是我获得了这样的东西:

Is the row 6 the wrong row? What are the numbers after the 6.? I am asking this because I am using the same code on a dataset that is bigger than this one, so I would like to be sure that I am doing the right things. I am not posting the other dataset because unfortunately I can't, but the problem there is that I obtained something like this:

  (0, 253)  0.5339655767137572
  (0, 601)  0.27665553856928027
  (0, 1107) 0.7989633757962163 has been classified as  7 and should be  3
  (0, 885)  0.3034934766501018
  (0, 1295) 0.6432561790864061
  (0, 1871) 0.7029318585026516 has been classified as  7 and should be  6
  (0, 1020) 1.0 has been classified as  3 and should be  8

当我计算最后输出的每一行时,我得到了测试集行的两倍...所以我不确定我分析的预测结果列表是否完全错误...

When I count every line of this last output, I obtain the double of the lines of the test set... So I am not sure that I am analyzing exactly the wrong list of predicted results…

推荐答案

第 6 行是错误的行吗?6后面的数字是什么?

Is the row 6 the wrong row? What are the numbers after the 6.?

否 - <代码>[6.2.7 5.1 1.6] 是实际样本(即它的特征).要获取错误行的索引,我们应该稍微修改 for 循环:

No - [6. 2.7 5.1 1.6] is the actual sample (i.e. its features). To get the index of the wrong row, we should modify slightly the for loop:

for idx, input, prediction, label in zip(enumerate(X_test), X_test, svm_predictions, y_test):
    if prediction != label:
        print("No.", idx[0], 'input,',input, ', has been classified as', prediction, 'and should be', label) 

现在的结果

No. 37 input, [ 6.   2.7  5.1  1.6] , has been classified as 2 and should be 1

这意味着X_test[37],即[ 6.2.7 5.1 1.6],已经被我们的SVM预测为2,而它的真实标签是1.

Which means that X_test[37], which is [ 6. 2.7 5.1 1.6], has been predicted by our SVM as 2, while its true label is 1.

让我们确认这个读数:

X_test[37]
# array([ 6. ,  2.7,  5.1,  1.6])

svm_predictions[37]
# 2

y_test[37]
# 1

这个结果与你的混淆矩阵 cm 一致,它确实显示了 X_test 中只有一个错误分类的样本:

This result is in agreement with your confusion matrix cm, which shows indeed only one mis-classified sample in X_test:

cm
# result:
array([[13,  0,  0],
       [ 0, 15,  1],
       [ 0,  0,  9]], dtype=int64)

一个更优雅的 for 循环,因为枚举包括样本本身,可以是:

A more elegant for loop, since the enumeration includes the samples themselves, could be:

for idx, prediction, label in zip(enumerate(X_test), svm_predictions, y_test):
    if prediction != label:
        print("Sample", idx, ', has been classified as', prediction, 'and should be', label) 

给出

Sample (37, array([ 6. ,  2.7,  5.1,  1.6])) , has been classified as 2 and should be 1

这篇关于Sklearn SVM - 如何获得错误预测的列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆