如何使用K折交叉验证来计算准确性和混淆矩阵？ [英] How to compute accuracy and the confusion matrix using K-fold cross-validation?

查看：1275 发布时间：2020/10/11 20:04:12 scikit-learn cross-validation

本文介绍了如何使用K折交叉验证来计算准确性和混淆矩阵？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试用K = 30折进行K折交叉验证，每折使用一个混淆矩阵。如何计算具有置信区间的模型的准确性和混淆矩阵？
有人可以帮我吗？

I tried to do K-fold cross-validation with K=30 folds, with one confusion matrix for each fold. How to compute the accuracy and the confusion matrix to the model with confidence interval? Could someone help me?

我的代码是：

import numpy as np
from sklearn import model_selection
from sklearn import datasets
from sklearn import svm
import pandas as pd
from sklearn.linear_model import LogisticRegression

UNSW = pd.read_csv('/home/sec/Desktop/CEFET/tudao.csv')

previsores = UNSW.iloc[:,UNSW.columns.isin(('sload','dload',
                                                   'spkts','dpkts','swin','dwin','smean','dmean',
'sjit','djit','sinpkt','dinpkt','tcprtt','synack','ackdat','ct_srv_src','ct_srv_dst','ct_dst_ltm',
 'ct_src_ltm','ct_src_dport_ltm','ct_dst_sport_ltm','ct_dst_src_ltm')) ].values


classe= UNSW.iloc[:, -1].values


X_train, X_test, y_train, y_test = model_selection.train_test_split(
previsores, classe, test_size=0.4, random_state=0)

print(X_train.shape, y_train.shape)
#((90, 4), (90,))
print(X_test.shape, y_test.shape)
#((60, 4), (60,))

logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)
print(previsores.shape)


########K FOLD
print('########K FOLD########K FOLD########K FOLD########K FOLD')
from sklearn.model_selection import KFold
from sklearn.metrics import confusion_matrix

kf = KFold(n_splits=30, random_state=None, shuffle=False)
kf.get_n_splits(previsores)
for train_index, test_index in kf.split(previsores):

    X_train, X_test = previsores[train_index], previsores[test_index]
    y_train, y_test = classe[train_index], classe[test_index]

    logmodel.fit(X_train, y_train)
    print (confusion_matrix(y_test, logmodel.predict(X_test)))
print(10* '#')

推荐答案

为了准确起见，我将使用函数 cross_val_score 来实现您的工作寻找。它输出30个验证精度的列表，然后您可以计算它们的平均值，标准偏差等，并创建某种类型的置信区间（平均值+-2 * std）
。

For accuracy, I would use the function cross_val_score that does exactly what you are looking for. It outputs a list of 30 validation accuracies and you can then compute their mean, standard deviation, etc and create some kind of a confidence interval (mean +- 2*std) .

由于不能将混淆矩阵视为性能指标（不是单个数字而是矩阵），我建议创建一个列表，然后迭代地将其附加一个相应的验证混淆矩阵（当前只打印它）。最后，您可以使用此列表提取很多有趣的信息。

Since confusion matrix cannot be seen as a performance metric (not a single number but a matrix) I would recommend creating a list and then iteratively just append it with a corresponding validation confusion matrix (currently you just print it). At the end, you can use this list to extract a lot of interesting information.

更新：

...
...
cm_holder = []
for train_index, test_index in kf.split(previsores):
    X_train, X_test = previsores[train_index], previsores[test_index]
    y_train, y_test = classe[train_index], classe[test_index]

    logmodel.fit(X_train, y_train)
    cm_holder.append(confusion_matrix(y_test, logmodel.predict(X_test))))

请注意， len（cm_holder） = 30，每个元素都是一个 shape =（n_classes，n_classes）的数组。

Note that the len(cm_holder) = 30 and each of the elements is an array of shape=(n_classes, n_classes).

这篇关于如何使用K折交叉验证来计算准确性和混淆矩阵？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用K折交叉验证来计算准确性和混淆矩阵？ [英] How to compute accuracy and the confusion matrix using K-fold cross-validation?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用K折交叉验证来计算准确性和混淆矩阵？ [英] How to compute accuracy and the confusion matrix using K-fold cross-validation?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭